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ABSTRACT 

This book is intended primarily for classroom 
teachers and other personnel who work diractly with teachers in 
selecting reading readiness tests or achievement tests. The first 
chapter lists and briefly explains the criteria used by the authors 
in reviewing the testa included. These criteria are cnncerned with 
norms, standardization, objectivity, ease of administration and 
scoraMlity, validity, reliability, and the test manual. The reading 
readinesa tests reviewed are the Gates^MacGinitie Readiness Skills - 
Test, the Harrison-Stroud Reading Readiness Profile, the Lee^Clark 
Reading Readiness Test, the Metropolitan Readiness Test, and the 
Murphy-Durrell Reading Readiness Analysis, The follawing reading 
achievement tests are reviewed: the California Reading Tests, the 
Gates-'MacGinitie Reading Tests,, the lowa Silent Reading Test, the 
Metropolitan Achievement Tests—Reading^ and the Stanjford Achievement 
Tests^-Reading, The appendix contains two charts, one a general 
description of the tests reviewed and the other a summary of the 
technical evaluation of the tests. (TO) 
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FOREWORD 



Toachrrs, supervisors, and administrators arc often laecd with the task 
of selecting and admiJiistcring tests and interpreting their results. That 
tests do perform a useful function is indicated by thc'v svidusprcud use- 
Nevertheless, the user is laced with a number of frustrations in selecting 
and interprctuig tests relating most ofien to Ills reluHvc lack of buckgruund 
in tests and nieasur^nient. Evaluation is the key to the practical value of 
this publication. The authors present more ihon a siniple review of tests 
arid their nianuals, Tliey react critically to what is presented and to wlmi is 
omitted. They point out the liniitaf^ons of specific tests as well as the 
strengths. The person who has serenely aeceptcd tests and their manuals at 
face value will be surprised and at times perhaps even shocked by what lie 
reads. The objective of providing a useful aid for the reading teacher has 
been well met with this publication. 

The International Reading Asiociation is also publishing an extensive 
evaluation and review of the research on tests and measurement in reading 
which will appear in the ERIC/CRIEF^ Rmding Review Series. This volume 
is also authored by Roger Farr. Two other titles in IRA's Reading Aids 
Series relate to evaluation in reading and may be of interest to the readers 
of this bulletin; informal Reading Inventories^ by Marjorie Johnson and 
Roy Kress, and Evaluating Rmding and Study Skills in the Secondary 
Classroom, by Ruth Viox, 

Leo F a y J Fresidet 1 1 
International Reading Association 
1968-1969 
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The International Reading Association attempts throu^ its 
publications to provide a forum for a wide spectrum of 
opinion on reading. This policy perrnits divergent viewpoints 
without assuming the endorsement of the Association, 



Chapter 1 



CRITERIA FOR REVIEWING TESTS 

• Why Such A Book As This One? 

THIS book is intended primarily for classroom teachars and other per- 
sonnel who work directly with teachers in selecting reuding readiness or 
achievement tests. One nuy ask, does one really need u guide to select a 
lest? All readers have probably had a course in tests and measurements and 
know the general rules for selecting an achievement test. However, niaiiy 
liad the course before actually teaching so that theory was too removed 
from practice and therefore, was not so useful as it could have been. But, 
more importantly * test development has made rapid advancement in 
theory and practice in recent years. 

Selecting a reading readiness or achievement test is continually 
becoming a more complex task with^ these advancements. Test manu- 
facturing has become a large scale enterprise with attractive and highly 
promoted reading achievement j assessment, and diagnostic devices. Some 
of these instruments are based on new research evidence on how children 
learn to read. Other tests are designed specifically to measure experimental 
programs, rather than the more traditional approaches. 

The computer has also made an impact on test construction. Rapid 
analyses of the statistical characteristics of a test are now possible. In the 
past it would have taken months or years to analyze the results of each 
item on a test given to a large sample of children. Using rapid analysis 
techniques, the computer has enabled test manufacturers to revise their 
tests more frequently, and the revision of old tests is based on mure 
accurate and complete information about the effectiveness of each test 
question. 

Old tests, howeverj retnain in the schools long after the cuniculum has 
changed. These tests are outdated and no longer serve the purpose for 
which they were originally designed. Yet, on the other hand, some of the 
older tests still are the "best" that are currently available. How does a 
teacher choose among them? Selecting a test takes time and careful evalua- 
tion, more time than the classroom teacher has to give fiuui his other 
instructional duties. This book is designed to review the major issues that 
should be considered before a test is chosen as the one to be used in a 
classroom. 
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The uiiiiiurH liuvu revicsvcd suvcrnl uf the iiiosi conniUHily used rending 
rciidiiicss and achievi:tncni tcsls uurrcnily uvuihible liiid llnvc evuiiuilcO 
tliusc insiruiiienis from both their cnnient und siuUstical chanictcristics. 
An nfialvHiH or ihc rescurch reparts fmni the URIC Clcarin^houHe on Re^ 
aricval 1)1 InrDrnialinn und Jivuluution on Reuding was u^^cd inanat!cni|n 
U) deteriiiine which reading lesls were being used niDSl often. 

These tesi reviuw^^ will hupefully servo as a guide lor evuluaUun In 
seleu'ing the appmpriute test for use in a specific chissrooni. This guide 
should reduce tlie lime normally $\mn in evuluuiinga reading readiness or 
achievement test. The issues considered by the reviewers in evaluating the 
tests are llie content meusured by (he test, its statistical properties, its 
scorubiliiy, the nieaning ol^ ihe subtest und total lest scores, and wheilicr 
the lest measures udcquately what it purports to measure. Alihougli the 
resulis are summarixed, it may be uselul to review the purposes and uses of 
achievement tests. 

• Why Use a Commercially Prepared 
Reading Achievement Test? 

Prediction and Assessment 

One's observatiun of a child'g daily per lb r ma nee is the main source for 
detDrmining how well a child is doing. However, one will also want to 
make periodic controlled usscssment of each child's current reading ability 
in order to place liini at his appropriate instructional leveh Teachers urc 
aware that a cldld makes the must rapid progress when instruction is near 
his current level of mastery. Thus, tests help teachers make initial, rougli 
assessments so that instruction can begin with a bctier probability of 
success, 

Tcachcr^made tests are one of the main sources of gathering data about 
children in a classroom. These results help one to predict future achieve- 
ment, assess how well children have accomplished the goals, provide feed- 
back to the child, as well as reinforce tlic student for what he has 
accomplished. However useful those results may be, teachers, parents, and 
administrators arc prone to want some uuiside assessment of how well the 
students urc doing when compared with a large sample of children of the 
same age and grade. Teachers have available a limited number of children 
in a class to gomparc how well that class or an individual student is 
progiicssing. Thus, commcrgially prepared tests are used to provide wider 
prediction and assessment of the pupils in a class. 

There arc other uses of tests besides those listed previously. A school 
district may wish to look at the general achievement level of its students. 
This district assessment may help the administration make suggestions for 
program improvement, purchasing additional instructional aids and equip- 
ment, or providing additional personneh In addition, tests are used for 
research purposes to evaluate the efTeetiveness of a now program or to 
compare two modes of instruction. Any of the criteria to be described arc 
relevant iar ilicse uses of tests as well. 
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• Factori to be Considered in Choosing a Test 



Norms 

A commercially prepared test usually offers the udvantage uf haviiig 
been adniiiiisiered to a large luniibcr of children froni a wide variety of 
rural and urban centers. Usually these tests have been administered to 
children of various social, racial, and ability levels. Thus, the test will have 
been **normed'' on a population of clilldren from nioro than just one class, 
school district, or state. A description of the norming population is critical 
for an interpretation of test scores. If one has a bright, urban class and the 
test has been normed with average, intercity children, the scores one's 
children obtain may indicate liighcr grade scores lor that class than is a 
realistic assessment. If the reverse is true, that the test originally has been 
given to a large population of bright youngsters, the scores may be lower 
than is a realistic appraisal of one*s students' current status. 

Standardization 

Adequacy of prediction and assessment are pertinent considerations for 
selecting a test. Often this category is called standardization, a term which 
is not an accurate description of one very iniportunt aspect of the tcsi that 
one is concerned with.. 

Clear standardized directions on how the test is to bo administered 
are desirable, A set of directions that is concise and uniform will ensure 
that the results are not depressed or inflated because the directions left 
the procedure unclear. The students' scores will not be so useful if the 
test is given in different way from the way it was given to the 
norming population. 

Objectivity 

A commercially prepared test also is intended to be objective; i.c,, the 
score acliieved should not be biased in sonie way by the tester or observer 
of the child's demonstration of what he knows. Encouragement, as 
everyone knows, can guiuc a pupil to a right answer. This is an excellent 
instructional technique as guideddiscovery experiments have demon- 
strated. However, at times one will want to know not how much a pupil 
can learn but how much he has learned and where he is now. An objective 
measure should enable one to determine this. As one shall see, tests vary in 
their oyectivity. 

Ease of Administration and Scorability 

Given enough time and personnel, a teacher nil^it collect very exten- 
sive data about a child. This undertaking is not possible In most instances. 
Teachers want a test that makes reasonable demands in terms of the 
amount of time needed to adrninister the test so that children are not 
fatigued and also so the classroom instructional progmm niay continue. In 
addition, tests that arc difficult and tedious to score are sources of error 
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unci use far more teacher tinic than is dDsirablc. Most acliievcment tests are 
designed tu niinimize the scoring time required uf teachers. 

Validity 

The test selected should measure the content one is teaching, A reiiding 
reudiness test should predict success in inltiul reading- not in grou{5 m- 
□pcrutivc play, although the two behaviors may be related, in addition, the 
test should predict siiccess in the classmonn whether one uses u luuk^say 
approach or a more linguistically oriented progranh 

A reading achievement test should saniple the decoding, vocabulary, 
and comprehension skills taught. The titles of the tosis should be an 
accurate description of the skills being tested. Proof sliould be gr/en that 
the skills of the test were measured with the norming population. 

The test should provide evidence that the skills measured arc either a 
measure of current status or arc of prcdictivD value. One should know 
which tests can be used to predict success or Aiilure in subsequent instruct 
tion. Not all tests provide this kind of evidence, 

I hrec kinds of ^'validity'' arc important to consider. One is content 
validity, which assesses whether the test measures the content one is teach^ 
ing. Second is concurrent validity which compares the test behavior to 
current performance. The third is predictive validity, which tells whether 
the score the child receives can be used to predict how well he will do in 
the future, A foufth, more difriculi kind of validity, is construct validity, 
which refers to the psychological processes represented by the behaviors 
cxhibitDd by the child during the test. For example, some reading tests 
claini that the comprehension skills measured on the test evaluate the 
child's ability to make inferences. Evidence sh.ouiJ be offered by the test 
manufacturer that the questions on the test do measure this trait. 

Reliability 

When choosing a test one will want it to be a reliable measure of how 
much a child knows or how well he is able to apply his skills. The test 
results should not be a chance score with a child obtaining a high score by 
luck, guessing, or other factors. The test should not be constructed so tha\ 
it gives the advantage to chiidren who know only one thing well. The test 
should be constructed so that one has confidence that the score the child 
receives today will be similar to the score he would receive if the test were 
to be readministered to the same child toniorrow. 

The Test Manual 

It is the professional responsibility of the test maker to provide suffix 
cient and appropriate evidence for the user to judge whether a test fits his 
purposes. Description of administration, norming, scoring, reliability, and 
validity should be provided in the user's manuaL The authors have used 
the test manuals to evaluate the evidence prDvided and to assess in what 
ways the test can be rcconimcnded for use. 
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• How Can One Use Test Results? 



Most ucliicvoniciit tests arc group tests Lind \m 'do a rough usscs^nient 
□1 liow a child compares wiili the norniing saniplc. Such tests arc nol 
ntcant to be diugnustic, nor arc they mean! to give an accurate assessment 
u'l uinutional reading levels. They are a rough and ready means oT grouping 
children for reading instruction. The grade placement score has little 
instructional value. The percentile score is more usclul but again requires 
caretul interpretation, if u test is used over u period of time, chiss norms 
may be built for a purticular school district. 

One uf the greater misum uf the p^oup standardiKcd reading tests is the 
use of grade level norms as an indication uf the level at which a student 
ought to be given reading instruction. Because of the nature of standard^ 
izcd tests, they are not appropriate for determining the reading level at 
which the youngster can profitably receive instruction. Standardized read- 
ing tests are developed from a group of items which are adminisiered to a 
particular normiiig group; the grade norm is based on the average number 
of itoms that students get correct at a particular grade leveL For example a 
scare of 6.0 would only indicate that a youi]^,.aer who is Just beginning 
sixth grade hud 100 items correct. This score does not mean that the 
student who had 100 items correct can necessarily read 6.0 grade level 
niatcriaL The standaidizcd teits were not meant to be criterion tests!! 

What we are suggesting is a procedure that iTiight be used to determine 
the level at which a youngster may be given instruction on the basis of his 
standardized reading test score. Bctts, in his 1942 hook Foimdat ions tif 
Reading Insiruction, suggested three functional reading levels. Tliese 
functional reading levels ure based on work that he and Patrick Killgallon 
had done. Credit also is given to Therndike for the idea. 

The three functional reading levels are as fDllows: I) The independent 
reading leveh the level at which a youfigster should be doing his leisure- 
time reading; 2) inmuctional readnig level, the level at which the 
youngster should be given reading instruction and sliould be learning in the 
content areas; 3) thQ frustration levcd, the reading level vvhich is too 
difficult for the youngster and which will probably lead to negative condi- 
tioning to reading. 

The independent level is identified by 99 percent or better word call, 
90 percent or better comprehension, and freedom from behavioral 
symptoms of tension and anxiety. The instrHCtiunal level is identified by 
95 percent or better correct word call, 75 percent or better comprehen- 
sioin and only slight signs of anxiety. The frustration level implies 90 
percent or less correct word call, less than 75 percent comprehension, and 
symptoms of nervousness, anxiety, and frustration. 

A grade level score from a standardized reading test more often than 
not places a youngster at his frustration reading level. This relationship, of 
course, is dependent on the particular standardized test that is used and 
the particular material which is used for the infornuil reading inventory. 
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A pruueduro svhidi miglu bo used by chssruuni teachers lu deiamww 
tlw junvtuml rmliftg levels (i.f\, indepeiuicnr, instniciiumiL and irustra- 
inm) iluii curicspund to various uvres on ilw mndardizcd tests would 
Work snmcthing Hko iliis: ^ 

The tCLichcr would administer tho inual standardized test to his class. 
He would then administer an iiifDrnial rcuding inventory f//^/^ to some of 
his students: tlie informal reading inventory should preferably be based on 
the basal reader which he was using for instruction. Youngsters to be 
tested with the IRI would be selected from several points along the range 
of scores students achieved on the standardized tests. Sludents should bo 
selected tor testing at least from the bottom, middle, and top of the range 
of scores. Additional points on the range could be sampled if time allowed. 
The teacher would then determine the relationship betweun various raw 
scores on the standardized reading tests and \hQ fimctioml reading levels 
an the infonml reading inventory. After he has pthered data of this sort 
for several classes, he would not find it necessary to readminister the 
informal reading inventory but could use the past pcrfbrmance of students 
to deierniine the levels at which they ouglit to be given instruction, 

These procedures would result in the teacher's ability to determine a 
student's functional reading level that would correspond to a particular 
raw score on a particular standardized reading test. For example, a student 
who scores ! 21 raw score points on a standardized reading test might have 
a fourth grade independent reading level, a fifth grade instructional reading 
level, and a sixth grade frustration level Such knowledge would allow the 
teacher to utiliEe the standardized test scores to place each student at the 
instructional reading level where he would have the greatest opportunity 
to succeed. 

• Plan of this Reading Aid 

Each test included in this review was assessed using the following 
outline: 

I. Test overview 

A. Title 

B. Author(s) 

C. Publisher 

D. Date of publication-original, revised 

1. Manual 

2. Test 

E. Level and Forms 
L Grade level 

2, Individual or group 

3. Number of forms available 

F. Administration Time 

G. Scorin|=hand or machine scorable 

H. Cost 

1 . Question bookletS'=consumable or not 
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IL Cost (com.) ' 

2, Aiiiiwor iiiioeU 

3, Mamml 

II. Evuhuiiiun of Subtcsls and Items 

A. Description of subtests 

L Given mcaningf\il name describe test adequately 

2, Is each subtest long enough to provide u^ble rcsuiis? 

3. Scqiiential developnicnt of eacii subtest logical, and transit 
lions sniuolii? 

B, Author^s purpose reflected in selection of itenis 
C Scoring ease and usability of tables 

D, Directions-clarity and level of language appropriate to grade level 

E, Dcsign-formut, curreniness, printing, legibility, pictures 

F, Readubiliiy 

III, Evaluation of Reliability and Validity 

A. Norming population 

1. Size 

2. Age, grade, sex 

3. Range of ability . 

4. Socioecononiic level 

5. Date of administrDtion 

B. Validity 

L Cgntent validity 

a. Face validity 

b. Logical ur^sunipling validity 
2 Empirical validity 

a. Concurrent 

b. Predictive 

3. Construct validity 

a. Construct and theory of which construct is a part clearly 
defined 

b. Discrinunant or convergent validity evidence 

c. Significant difference found in performance between 
groups which have varying degrees of this trail? 

4, Docs rcportccl validity appear adequate in relation to 
author's stated purpose? Why or why not? 

Following this review, tables (refer to Appendix) were constructed to 
summarize the characteristics of the test for a quicband-ready reference 
for use. 

Each test is described, and the strengths and sveuknesses are delineated . 
so that one may evaluate the test one's self. Each review was sent to the 
publisher for his reactions. In some cases, additional inforniation was given 
the authors and this matter was included in the review, if the necessary 
data were not located in the manual but found elsewhere, the appropriate 
sources have been indicated. If the authors did not a^ee with the 
publishers' criticisms, this fact has been indicated so that potential test 
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users mn come to their own gonctusions. It sliould be the teacher who 
makos the final dcuision on the use of a test based on his program; the 
authors can only guide and suggest the criteria by which that decision 
might be made. 

• What If the Responsibility of Test Publishers? 

A test should be placed in the same category as a critigal drug. A test 
should be used only afrcr a careful study of its effects has been made. 
Evidence should be provided that the test (or dfug) will do what it 
purports to do. Too many critical decisions arc made about a child based 
on his test scores to use any test that is not a reliable and valid assessment 
of the child's ability to do the task described by the test. A teacher should 
insist that the test manulacturers provide him with the iame reputable 
product that he would demand of a drug manufacturer who offers a new 
cure. It is better to use nu test than to use an unreliable or invalid one. 
One finds that a number of tests arc released before adequate data are 
avuilable. 

iVluny tests have not been studied sufficiently before they arc put on 
the market for sale. One hopes the reader will note these deficiencies and 
realise how serious llie action is to make an insiructionaK promotional, or 
evaluationa! decision about a child when it is not based on an accurate, 
stable, or predictive measure of his achievement. 
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Chapter 2 



SELECTING READING READINESS TESTS 

READING rcudiness tests are adinlnlstercd In a majority of llic elomciv 
tary schools in the United States. Tho populurity of the tests alone 
suggests that they are useful to teachers in making decisions about 
children. What docs the teacher who administers these tests hope to Icafn 
about the students? First, he would like to know ii liiu score on tlie test is 
a valid predicior of wliethcf a pafticular student is ready tu begin formal 
reading instruction, In addition, the subtest Si^ores on tiie readiness test are 
siiid to assist in diagnosing tlic readiness skills in wiiich each student is 
weak or strong so appropriate instruction can be planned. 

These two reasons should, Ihererorc, be the prime considerations in 
evaluating a reading readiness test. One should seek evidence that relates to 
the predictive power of tlic lest. Do students who score well on the test 
become good readers? Arc tliese high scorers ready for formal reading 
instruction? Secondly, one should cxaiiiinu the subtests and items to 
determine if one agrees that tlicse arc the most important skills for a 
student to develop if initial reading Iiislruction Is to be successfuh If It is 
decided that ihe skills fram tlie test are apprupfiale, then one should look 
for evidence regarding the unit|Ueness of the subtests. In order to use the 
test in a diagnostic fashion, tlie publisher should provide evidence that the 
subtests deal witli separate measurable skills. 

Further evaluation of a reading readiness test should include a more 
careful than usual examination of tlic testing procedures and the test 
format. Because reading readiness tests are used with such u young age 
group, the examinees can easily be penalised by an unusual test format or 
a lack of clarity in the oKaminer's testing procedures. 

Finally, one should also examine the usefulness of the test scores. This 
aspect is partly determined by the subtests included on the test, but it is 
also determined by the description of the use of the test provided by the 
publisher. One should feci confident in knovving what to do with the tcbt 
scores. Mow do they relate to reading readiness? Do low scores mean that a 
student should not begin readiiig Instruction? How should, the subtest 
scores be used? All of these questions should be answered Ijy the publisher 
in a clear statemcni. In addition, the publisher should discuss the relation- 
ship of the readiness skills measured by his test to the readiness skills or 
child behaviors which cuiuuH be measured by a test. Lack of such a discus- 
sion will seriously limit the use uf the lest. 



It is believQci tliat the must important use uf reading readiness tests is to 
determine whigh reudincss skills need further dcvelQjjnien! boforc tlie 
sUidcnts can begin to learn to rcud. However, tlic predictive vulidily of u 
readiness test should help one to determine the importance of various 
readiness skills. Must publishers uf reading readiness tests do describe such 
validity. 

Regardless of how much evidence is provided, a simple check on how 
svell ilic tost predicts for a class can be obiained. After administering the 
scures and beginning instruction, one should save the tests until the end of 
the year and then eompare tlie reading level obtained at the end of the 
year with the readiness score. Is it in general agreement with a clasps 
achievement? Arc there sonic, students who failed to do as well as 
predicted? Arc there some who did better? Can the reason for this be 
dQiermined? 

The use of a scattergram (Figure 1) will help one to visualize how woll a 
particular reading readiness tost predicts reading achievement for students 
in a specific instructional program. The scattergram is developed by 
plotting the child's intersection of his readiness test score with his scores 
on a subsequDntly administered reading achievement test. The readiness 
test may have been administered at the beginning of first grade and the 
fcuding achievement test at the end of first .grade, but this time is 
dependent on the period for which one would like to predict. The scatter^ 
gram will be more useful if one uses a number to represent each student 

Figure 1 

Scattergram— Reading Readlneasand Reading Achievement Scores 
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rather than an ''X" for all students. If the test publisher provides standard 
scores or pcrcentilGS it would be better to use these rather than raw scores 
in plotting the scattergram. 

The scattcrgram includes ten students, each represented by a different 
number; the scores indicate that the readiness test docs predict reading 
achievement scores fairly welL For example, child number seven scored 
somewhat poorly on the readiness lest and on the uchicvcment tcsticliild 
number ten scored fairly high on both tests. 

The charts in the Appendix describe and evaluate several of the major 
aspects of the readiness tests reviewed. A more complete ovaluation of 
each test follows, 

• Gates MacGinitie Readiness Skills Test 
Overview 

The Gatcs-IVIacGinitie Readiness Skills Test Is a revision of the Gates 
Reading Readiness Test. The new test, publislicd in 1968, is intended fur 
use with pupils at the end of kindergarten or the beginning of first grade. 
Eight subtests are included^ but only the first seven arc combined to arrive 
at a total readiness score, The seven required subtests arc Listening 
CQmprchcnsion, Auditory Dlscnminaiiwu Visual Discrimination, Follow- 
ing Directions^ Lattar Recognition, Visual-Motor Coordination, and Word 
Recognition. 

Student responses are recorded in the test booklet. The pictures and 
words are large and easy to read; however, the use of some aid to help a 
student keep his place would probably aid in the administration of the 
test. The directions for the examiner and the oral directions to tlie exami- 
nees are adequate in terms of clarity^ comploteness, and appropriateness 
for kinderprtcn and first grade students. A separate scoring key is pro- 
vidcd, and tables arc included for end of kindergarten and beginning of 
first grade; however, the norm group Is not described for either of these 
populations, The tables provide stanine scores for the subtests and total 
score as well as a percentile score for the total score. 

The total raw score is arrived at by multiplying each of the subtest 
scores by a weighting factor of from one to three. This scoring procedure 
is used because the test authors feel that certain subtests are more prC' 
dlctive of later reading achievement than others. The Let tar Recognition 
scorCi for example, is multiplied by three while Listaning Qjmprchension 
score is multiplied by only one, This procedure was developed by analyz- 
ing data from the standardization of the test. However, no informatiun 
about this study is repqrted In the munual, and the reading achievement 
test which was used as the cfiteria tost was not named, Bocause of tills 
limitation, it is difficult to determine the value of the weighting procedure, 
If one utilizes this test, it would be useful to compare the total raw score 
with the weighted total score to dotormine which is the better predictor of 
later reading achievement with one's classes. 
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Norms 

While norms arc pruvided A^r both the end of kindcrgurien nnd the 
beginning uf first grade, there is no descriplivu inlbrniatiun of the norniing 
pupuluilun. There Is not even mi indication uf the latiil number of cases 
inchidcd in the sample. Thore is also no reference made to any technicul 
infoi maiiuii uv:,jlublo from the publishcn Fur these reasons, it woLdd be 
very iMiwise I'j use tlic test norms pravided by the pubHsher. One would be 
cnnipuring ono's students to a compJetely unknown populatigiu and this 
knnwiedge wu\ not be of any value in determining whether these students 
are ready to begin reading iiistf uction. 

Validity 

The content vaNdity uf the test appears to be appfupriatc for measuring 
many of the skills necessary to beginning rormal instruction in rcadiiig^ 
Several of the pictures secni to be biased loward a middle-class popLilMtion. 
lithnic differences arc represented with several pictures of Negro children. 
The results of using the lest with certain cultural groups would aid in 
determining the validity of the test with these special groups. The authors 
also encourage the use of teacher observatioiis and" Informal tests for 
nieasuring other aspects of the pupils' develapment. For those reasons, the 
cuntent validity uf the test is quite satislactury; however, there is a com- 
plete lack of any uihcr validity evidence. This condition would make any 
diagnostic or predictive use one might make of the tests completely 
dependonl on the Information gathered with one's own classes. 

Reliability 

The auihors discuss some of tjie pertinent (actors related to relia- 
bility such as, the higher reliability when test scores arc combined rather 
than used separately, the higher rcliubillty with relatively longer tests, the 
higher reliability of scores in the middle of a ranp of scores when com^ 
pared to scores at cither extreme, and the relatively high unreliability of 
dilfcrences between test scores. This information Is well prcseiited and 
should be considered by the test user; but the publisher doeb not give any 
Hiformaiioii about the reliability of the total test or subtests, and, there- 
lore, there Is no basis on which to determine If the score a pupil receives 
on one day is likely to be the same as the score he would rGCoivc on 
another day. 

Evaluation of Subtests and Items 

The names of the subtests adequately describe the tasks. Each test is 
arranged in a logical order, and generally the tasks become more difficult 
as each test progresses. The use of letters and words in the Visuah 
Dhmmimtkm and Vimal^Motor ^onUmtiun subtests seems to be in 
keeping with the trend away from the use of geometric shapes as was the 
case with eurller readiness tests. This procedure will probably increase the 
predictive validity of the tests because the tasks more closely resemble 
actual reading behavior. 
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Tilt? authurs uauliun aguinst the urn .uf scpiiruic Kublcsi scurfs nnd 
gest lliLit the lutui test score is mure uscrul. The writer wuuUI siruiigly 
siipporl tliis advice because of tlic luck reliitbiiiiy tind validiiy evidence 
for the subicsis und ulso would cuuiiun ugaijist any dbgnostic use uf 
sublcst scores, even when sliuiine scores diiTer by us nuicli us three siiiiiines 
as the lest uuthors suggcsi, becuuse several of tlic subtests arc quite shi)rl, 
varying in length lYoni eighteen to twenty-fyur items, und alsu because the 
nurming populaiiun on which tlicsc slunincs arc based is a cumpl;tely 
unknown quuntity. The test authors ulso encourage llie lliuiiglurul inter- 
preiution of any student's scorcH and suggest that reading reudiness test 
scores arc quite dependent on the teacher's instructional prnccdures and 
Dverylliing lie knows about ttie children in his class, li is very rerreshing to 
see siicii a statonien^ printed in a test manuah 

The eighth subtest, which Is not used in uvriving ut a total veadlucHS 
score, is Word Rxx'vgnitivfL \\m test is uctually a reading uchievemeni test 
and can be used by the teacher to Identify those children who have alreadv 
begun to leuni to read. While the test Is u useful uddiiion, it is quiie 
probublc that the alert teacher would not need such a test to Identify ilie 
student in liis class who had already begun to learn to read, 

Summary 

This test appears to have content validity for measuring many of ilie 
skills which are necessary to begin reading instrugtjon. Tlic authors poln! 
out the sliortcuniing of readiness tests and generally do an adequaic job of 
describing the value of the test; but the lack of complete validity, relia- 
bility, und norniing data make the test of very limited use to the teacher. 
The test eoidd be used as a criteria test for detcrnilningachicvenient icveLs 
for certuln readiness skills, but it is probable that the subtests ure iou short 
to give valid or reliable infbrination. This test appears to be um that has 
been published and Is available for side before the colleetion of validiiy 
and reliability data. A more coinplctc test manual woidd also cnhunce the 
value of the test. 

• The Harrlion-Stroud Reading Readiness Profile 
Overview 

The Hurrlson-Stroud Reading Readiness Profile is presented in three 
booklets and was revised for publication in 1956. According to the 
aiitliors, the test is designed to measure those skills which are neces&iry for 
beginning reading. Six subtests arc Included: the first live can be udniinis^ 
Icred on a group basis, but the sixth must be administered individually. 
The subtests Include U^ing Symbols, Making Vimal Dismminatkms, Using 
llw Q)nmt, Making Audlttny Dismminartom, Using Contuxt and_ 
Auditory Cluas, and Giving die Ncnnes of the I.attiTs, 

Sludcnts' responses urg written in the lest booklet; only one form of 
the test is available. Scoring is somewhat difnyuli because nu separate 
scoring key is provided; the exammer must search ihrough the maiujal for 
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the scoring key tor each subtest. For convenience, tables for converting 
ruw scores into percentile ranks are printed directly an the front page or 
llic llrst test bonklet. The examiner's directions arc cleurund precise, and 
the langiuigc of the oral directians is also appropriate for kindergarten and 
first grade children. The fur mat of the test is atiraciive and efficient; 
colored boxes urn utilized us place-keeping devices. The use of throe colors 
is functionai In giving the directions for each item of the test, The 
children's layout of the questions is spacious and clear. 

Norms 

The norms for intCrpretatiQn of raw scores were based oji 1,400 pupils 
in Ihirly-twn communities in twenty-eight states In 19S5.This population 
is not adequately described as to range of ability, sex. chronologlcnl ages, 
and SDcioeconomic IcvcL This is a serious weakness of the test and would 
imke the use of the norms very questionable. In addition, the use of 1955 
norms also seems to be a dubious basis for evaluating the performance of 
children loday. The authors do not suggest the developmenl of local 
norms, but this procedure would seem to be essential for interpretation of 
raw scores because of the cited limitations. 

Validity 

The content validity of the test is based on the logica! assumption that 
the test measures the skills essential to beginning reading. All of the tests 
appear to achieve this purpose. The test lacks other validity evidence. It is 
believed that the most important use of a reading readiness test is to 
diagnose students* strengths and weaknesses neces^ry to beginning reading 
histructionj but the test authors present no evidence regarding the 
diagnostic or subtest validities of the test. A second use of readiness tests is 
the prediction of later reading achievement, and this test includes no 
predictive validity evidence. 

Reliability 

The complete lack of reliability data is one of the main weaknesses of 
the manual of this test. The manual gives no evidence reprding the relia- 
bility of the total test or of the subtests. 

Evaluation of Subtiits and Items 

The numcs of the subtests are meaningful and adequately describe the 
tasks involved. The tusks on the subtests are consistent with the authors' 
stated concept of the nature of reading readiness. They list eight factors 
that are important in reading rGadiness, and all of these ractors are in^ 
eluded in the subtests, The authors suggest that other factors, such as 
intelligence test scores and teacher observations, should also be used in 
determining Instructional group placement; but the student behaviors the 
teacher is to observe are neither described nor diicussed. The most serious 
prrblem is the lack of a total readiness score. The authors defend this 
procedure by suggesting that the tested skills do not develop evenly In 
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children and, thcrefuro, suggest that the subtests should be used diagiiusli- 
cally. However, the use of subtest scores in this niunncr would niandaie 
evidence of the distinctness of each subskill and also cvidenco regarding 
tlio reJiubility of each subtest, but nunc Is given. The authors also suggest 
the use of percentile levels for grouping pupils for inslructioii based un 
performance on the subtests, but they equate each test as being of equal 
importance ill this procedure. 

The Using Symbols and Visual Discrimination siihiOBiB ulilize words 
rather thait geometric shapes as many readiness tests do, it Is believed iiuit 
using words is the more valid procedure because of its more closely 
resembling actual reading behavior. Most of the items on the test do not 
appear to be extremely biased toward a middle-class cultures iiowcvcr, 
evidence regarding this conclusion is not available. Sonic of the items are a 
bit dated but do not appear as though they would interfere with the 
pupirs determination of the corroct responses. 

The lack of reliability for the subicsts is compounded by the authors' 
suggestion that the subtests be used diagnostlcally and also by the rela- 
tively short length of most of tlic subtests. The total number of possible 
correct itemi for each subtest is 22, 14, 16, 18, 16, and 18. 

Summary 

The test does have face vulidity for the diagnosis of reading readiness 
skills. The lack of reliability and validity evidence seribusly limits the value 
of the test. The use of the test norms is not recommended because of the 
limited description, The writer suggest! that this test could be used most 
effectively as a criteria test for nieaiuring mastery of certain skills* but it 
should not bo used for comparative purposes unless local norms arc doveN 
oped for that purpose by the teacher of school district. 

• Lee-Clark Riding Raadineii Test 
Overview 

This is one of the better known readiness tests. Its reputation is largely 
due to the many editions that have been available since as early as 193 1 . It 
Is the 1962 edition that this review covers. 

The test is composed of three parts made up of four subtests, Part 1 
contains two I2-item tests of Letter Symbols, a total of 24 items. Part 11 
contains a 20-item Q)ncept test, and Part III consists of a 20-ltcni Wonl 
Symbol subtest, 

A partial scoring key is provided. For two subscalcs, scoring is done by 
InspQCtion without mechanical aids or accessorieSi The manual suggests 
that an extra test booklet can be marked or cut oin for a scoring stencil. 
Although this marked or cut out copy would take an examiner time to 
prepare, it would probably make scoring more convenient. Scores arc re- 
ported on a profile on the back of each test booklet. The profile provides 
interpretation of ^ade placement, expectation of success ratuig, and 
indication of months of delay before beginning formal reading instruction. 
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When cumpletod, the profile provides the tcadicr witltan overall piciure 
or the child's pcrfurmancc on the entire icsL Two tables uro provided for 
grade-pluccmeni equivalents of total scores and clussincatiuns of hij;h or 
low readhiess for entering ITrst grade pupils and for ond^ol^yeur kinder- 
garten pupils. The grade piuccmont scale of high, high average, low average, 
and low was based on a small sample of 177 pupils who were uivcn this 
readiness test in the first month of first grade. In April and May the sample 
subjects were given the Lee^CIark Reading Test: Primer, The pupils were 
then divided into Ibuf groups in tcrnis of ihcir primer scores, Thissariiple 
was quite small, and one must- question the lack of further dcicriptive 
inlbniiaiion pertaining to the sample. One must also be dubious about the 
validity for such exact and detailed analysis of the fbur interpretative 
categories. However* this appruach is an improvement over the 1951 
edition which provided no slutistlcal support for the Interpretative tables. 

Directions for administration of the test tire clQarly and exactly slated 
hi the manual on an uppropriate language level fbr young children. The 
uuthors caution examiners to use the exact directions and to administer 
the tests in small groups. When the group excocds 15 pupils, the autliDrs 
recommend an additional adult assistant. The nature uf the lllustrutions 
has been revised in this 1962 edition to onlarged, shaded drawings, How^ 
ever, the drawings in subtest three are small and blurry. The pictures, done 
In a soft green shade of Ink, produce u pleasing, nonglaring effect. The 
format is attractive and easy for children to manipuljite. 

Norms 

Norms (or Jhis readiness test are based on a '^slight adjustment'' of the 
1951 norms. The 1951 norms were based on 5,000 entering first graders. 
No further description of the 1951 norms is provided In the revised 1962 
edition. Although the 1951 edition provides some information pertaining 
to the norm sample^such as, median ehronological age, median IQ, and 
racial background- test users should not be required to search out an 
earlier edition for this important information. Since 1951, the norms have, 
been adjusted to produce slightly more difficult norms on the basis of 
1,000 end^of-klndergarten and first grade pupils. Two of the first grade 
samples were also followed up and tested ut the end of the school year for 
comparison. The slight adjustment of the 1951 norms provides a question- 
able norm basis for kindergarten and first grade childron of 1968. New 
norms which utilize up^to-datc saniples would provide a better basis for 
current norm tables, 

Validfty 

Prcdjctive validity studies are presented for the 1962 reviiion and Tor 
previous editions. The 1962 revision validity studies were based on five 
groups of entering first grade pupils. However, the manual states that the 
mean scores for those groups approach the upper limits of the test and that 
the standard deviations were restrigled, such inlbrmatlun Indicating that 
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rew of the pupils in ihesc cbsso^ had scores thai placod tlicm below a 
roadincss clussincaiiun cf 'Miigh average/' Thereiorc, the reported validity 
will ni5t be upplicable to all tcst-cunsumer groups- The manual also reports 
other predictive vuNdhy studies that were cgnductod, such us the study in 
hQ public schools of Portland, Oregon. However, the mainml stutes that 
. , Ihe pupils cnpged in this study were far advanced in readiness Uevel- 
O-S nenl by the time they were tested." This fact is renected in the mean 
Ixc Clark Reading Roadiiiess Test scores. This predictive validity study 
witifi '*far advanced" feadlncss students docs Jiot rcllcct the reconiniended 
test adminislratian tiiiie of cnd-of-kiiidcrgarten or entrance-o^flrst-gfade 
lliat is stated in the nianuah The nianual further cautions that '\ , Jf 
testing is delayed too far into the first grade ... the results for many 
pupils in normal groups will do no more than verify that they are ready to 
read/' The ''advanced*' readiness sample group will also be inappropriatQ 
for many test consumer populations. 

The manual does not report any discussion of face validity or sampling 
validity. To the qbscrver, ligwever^ the individual subtests do appear to 
have appropriate content. According to the defined bchavlorul terms of 
the readiness trait in the manual, the subtests do not raprescnt all tlic 
aspects of readiness. The test authors suggest that a definition of readiness 
should incUide physical maturity^ niotivatiun, mental ability, emotional 
adjustment, and experiential background. Due to the fact that the test 
measures u more limited number of skills, the authors rightfully recom- 
mend that this readiness test should not be the sole measure of basis for 
decisions on pupils' readhig readiness. 

Reliability 

Reliability coefficients ranging from .87 to .96 were established on the 
basis of split halves by the Spearman-Brown formula corrected fur length. 
The sample on which these rcllabjilty coerfjcicnts were based is not des- 
cribed adequately. The following is the only description of the sample: 

, , unselected school samples having means and standard deviations 
typical of a majority uf schools in which the test is administered/' The 
standard error of measurement showed that the chances were two to one 
that the examinees' readiness grade placements would not vary more than 
two months, and nineteen to one that they would not vary more than four 
months from their true readiness grade plucemont. 

Intercorrelation cocfncients and reliability coefficients based on the 
Kuder-Richardson formula were computed for part raw scores. Tliis matter 
is important for diagnostic use of the part scores. The resulting coerficicnts 
showed that Part 1 (letter Symbals), I'm UfCtmceptsl and Part 111 /IW^^t/ 
Syrnbols} scores were sufficiently independent for utilization in deter- 
mining whether pupils have understanding of spoken words and concepts. 
Considering this test's brevity (it only takc^ -ilnutes to administer), the 
reliability is surprisingly adequate. 
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Evaluation of Subtests arid Items 

Subtest one requiros the child to matcli letters in one gnlumn with the 
next. This lest is called Alatching hi the scoring key rather than Laticr 
SymbolS'Pari I us it is numcd in the manuah The lest is sliurl^ but the scwrc 
is combined with test two, n 12Hlcm subtest, which requires the child to 
matcii the correct letter out of an array of four letters. The conibined 24 
items arc sulTiclcntly long. 

Part II (test three) is named Concepts hut is referred iomOvsS'Out in 
the scoring key. Tlio clilld is preientcd with a series of pictures and is to 
^'CTussout" the item named by the tgachor. The quality oftlic pictures, in the 
writer's □pinion, are somcwhut blurred and too detailed lor adequale djs^ 
crimhiatiun by youiig children. It Is interesting to note that this subtest 
luis the lowest reliability, too low for individual use; the style of presenta- 
tion may be u (aclur. The roseareh on concept learning suggests simple line 
drawings form the presentation of concepts rather tlian overly detailed 
piutures. The pictures are not too dated, but the writer suspects that the 
inner city or ruruK impoverished child would not be familiar with many of 
' the concepts presented. No data are presented thui indicate whether the 
test lius been used with a variety of schools. The luck of information on 
■ the sample for vvhieh these data arc based makes it extremely difficult to 
know for what group to recommend the'tcst. Possibly surburban schools 
have long used readiness tests, and this is the population on which the 
reliability was established. 

The sequential development of the individual subtests Is according to 
increased difficulty, and the transition between subtests is smooth. 

The test authors' stated purpose of this test is not only to predict 
ability to learn to roud but also to provide data for intraclass grouping and 
to analyze reading readiness needs. It should be noted, however, that the 
number, iiature, and length of the subtests do not lend themselves to 
giving this information. This readiness lest appears to be more useful us a 
gross screening device ruther than us a diagnostic took ThcrefDrc, the 
specific items and subtests do not reflect all of the test authors' stated 
purposes. 

Summary 

As noted before, it is the test munufactufef*s responsibility to provide 
adequate information about the normlng of the test and the nature of the 
population of students used to establish reliability and validity. From the 
limited information given, the writer suspects the population consisted of 
groiips of brlghtcf^than^average, middlc^dass children^ Thus, the students' 
lest scores In u class should be evaluated with great care. While the total 
score reliability is high, the part two Goncepts score section should not be 
used alone. 

The entire test may bo useful as a gross screening device, but it is not 
sufnciently broad in the skills that it nieasures to provide diagnostic assess^ 
ment of children's readiness strengths and v^caknesses. 
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• The Metropolitan Readineii Test 



Overview 

The Mctiopolitun Readiness Test is widely known and lias recently 
been revised in 1964, The 1964 cdltiun is reviewed here. 

The test gives scores for six subtests unci a total score; an uptiunal 
subtest, Dm\K^A-Man is indudocL The subtests arc Word Mvaning, Lisiaih 
big, Matdiing, AlphabGt. Numbers, and Q)pylng. 

- A separate scoring key Is provided to (ucllitate hand scuring, A scoring 
service is also availabic. No description oT the scuring service is provided in 
the test nianuaL Tables for converting raw scores into percentile ranks, 
stanincs, and statements o( rcadiness-'Such as, supcriur, high normal, 
average, low normal, and low^^are conveniently arranged and are easy lu 
utilize. The directions for administration of the test are clearly and exactly 
Stated in the manual and include dircctiuns and illustrations to helpsc:)re 
the subjective Dra\9-A-Man subtest. The oral directiuns tu the pupils are on 
an appropriate language level for young children- The formal is adequate 
for the age level and includes green lines to separate items and symbuls to 
help tlic children locntc and maintain their places in subtests where 
needed, such as m Alphabet mrd Numbers. Although most of the items arc 
current, tlic majority appear to be drawn frum middle^cluss experiences of 
suburbia, particularly the eastern part of the United States. In addition, it 
would appear that some individual items may be measures of intelloctual 
funclloning rather than measures of roadincss to begin formal reading. 

Norms 

The norming population consisted of 12,23] pupils in 65 school 
systems in a wide, regional distribution in the New England, Middle Atlan^ 
tic, Central, and Souih Pacific states in 1964. Unlike many test manuals^ 
this one states a caution In the use of the norms because of the slightly 
higher income median of the sample. The test developers encourage estab- 
lishment of local Interpretative norms based on local expericnue. This is a 
very desirable statement as the American Psychological Association's 
Standards far Educatkmal and Psychological Tests and Manuals points 
out: ''Local norms are more important for many uses of tests than arc 
published norms. In such cases the test manual should suggest appropriate 
emphasis on local norms , . . 

Validity 

In regard to content validity, the term and concept of **ruadiness" arc 
carefully defined^ and a list is provided of the most Important components 
of first grade readiness in the view of the authors. After this breakdown of 
the total arcn of readiness into categories, the content of each of the six 
subtests is disqussed against a background of the anaiysis of the 
components »f readiness. While the content of the scales appears to be 
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iippropfiute, llic word meaning aiicl listening uoniprchonsiun scales ^ceiij tu 
cuiHuin sume items tluii arc niore suitable for iniddlc-cluss yuiingstcrs. 

[iviclcncc orconcm-rcnt validity is presented by nicans of cui reluiions of 
iliC Meimpuliian subtests uiid the total scuie with ^scores un the Murpliy- 
Durrell Reading Readiness Analysis and the Pintncr-Cuniiingham Priniury 
Mentu) Ability Tost. The tutal score on the Metrupulitan correlated quiic 
fiighly f.80) with the tutal score un the Murphy-Durrell. Tlierc was alno a 
cyrrelation of .85 between the Leltcr Naming subtest of the Murphy- 
Diirrcll and [he A Iphahet subtest of the Mciropulitaii, Other correUnions 
among the mbims were small and probably hidicaie that the two readj^ 
ness tests sample ablHijos that make up their respective composites in 
dilTerent manners. A corrchuion of .76 was found between the tola! score 
on the MeiropoHtan and the total score on the Pinlner-Cuniiingh.am test. 
Because little infDrmation aboul tlic suniplc is given in the manual, the 
prospeetive user is not able to Judge whether the reported validity is perti- 
nent to his situation. Predictive vulidily studies are only reported for ilie 
three experimental fbrnis ruther than tor the final lorms, A and B. In the 
eKperiniental fornis, the AlphabLH subtest seenied to be the best predlcior 
of i'utnre success in reading. The Nwnhcrsmbxm was u good predictor for 
both future reading and arithnietic success. Additionai predictivo validity 
studlns arc being conducted now and will be provided In future editions of 
the nianuaL These studies are available upon request (roni the company 
and wll! ue eluded in a new nianuah 

Evaluation of Subtests and Items 

The following subtegts arc included: Word Mmihig, Listening, Matdi- 
iim, Alphalmt, NiuubGn, Capyiug, and an optional subtest, ^/r/uM-/1te;/. 
Tlie names of ilie subtos''tS arc not meaningful. For exanipic, the names 
Afciidung and Copying are not expanded or explicit enough to describe 
whether wurds, letters^ and/or geometric forms are matched or copied. 
The Matching subtest actually measures visual perception involving tiie 
rccugnition of similarities through the use of words and forms. 1\\q Copy- 
ing subtest involves letters, numbers, and forms. The length of e-ch of the 
subtests is quite sliort, and the reliability of the individual subtests is much 
lower than that of the total score. However, the manual dim suitably 
discourage attaciummt of stgnijicanca to tlie individual mbtm scores. The 
number and type of subtests appear to be consistent with the purtx3ses of 
the authors, for they designed this test as a measure of rcadiricss fur first 
grade instruct ion Uhercfore, both number and reading readiness factors are 
considcied. Tlie authors also suggest that icachcr ratings, observations, 
informal tests, and their readiness inventory be used as supplcmontary 
aids, because . , paper and pencil readiness tests do not measure all the 
components of general roadincss for specific skills, such as reading or 
ariihmetie/' 

The length of each of the subtests is quite short, probably due to the 
age of the child for which the test is constructed. The writer concurs with 
the authors' rcconimendaiioiis that llic subtest scores not be liscd and 
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would further udvise lliat these scorGS sliuuld not he r^uprded on a 
Sludcnrs regord. The studios provided in the nuniuul by the cumpui^y 
indicutc thai us prediglurs the subtest scores are not (jDnsistently stublc 
ciiuugh lYoni saiiiplg lu sainptc iu place cunndciicc In them. The tutal 
score, however, uppoars to be a good prodiclur of reading iueccHs. 

Reliability 

Reliability was dotgrmined witli odd-even coefficienlts corrected by the 
Spearman-Brown fornnila. Alihougli the sample tor svliicli the reliability 
was compuled uonsisted of students from three school syslcnis, the siunplc 
is not furtlicr described. The reliability of the total score was above .90 in 
all three of the sample groups. Rcliabililies of the subtests arc lower but 
silfngiently high to merit conndcncc -except for scale iwu. the listening 
section. The reliability of this scale is mucli lower thaii one would desire 
^um the data presented in the munuaL Since two forms of this test are 
available, addltionat reliability information pertaining tu the gurrelutiuns 
between the two forms should also be reported hi tlie nianuul. In the 
section on "constructioir' of tjic test, the nianual only States that ltcni= 
discriniinalion indices were used as a basis for selection of the itcnis fur 
the two final lurnis tliat arc CDnsidcred *'equivatent/" More evidence to 
support tlic assumed coniparability of Forms A and B is needed. A sepa- 
rate handout sheet docs report these data and is Intended to be included in 
the nianual in the future. 

At the present lime, the test nianufacturgrs do nut provide in their 
nianuul u complete cnongh description of how the tegl acts as a predictor 
for multiple samples. In a new manuah cvidencg is offered that the test 
may not serve so well vvith rnrul suuthern children where the correlation 
between the Metropolltun Achievement Test and the Stanlbrd Achieve^ 
ment Test is .60-63. 

The lest manufacturers do provide many single-sheet summaries of data 
collected by Individuals in various locations who have used the test. These 
data strongly support earlier conclusions about the lest. The correlations 
of the \\^nl-Mc^ning Usiening subtests with actual reading success are 
much loo low in many populations for one to ptacc much Qonfidenee in 
the scores as predictors of which children will learn to read. This fact is 
particularly^ true of somplcs from the South (Soulh Carolina and Missis- 
sippi) and samplps from predominately rural states (Wisconsin), l^ven the 
total score is correlated with reading achiovenient in these samples below 
what would be desired, The question of the test manunicturcr's re^^ponsi^ 
billiy .n providing adequate data about the validity and reliability of the 
tests has been raised In this rcvicvv^. It Is maintained that before a test Is 
offered for side adequate information should and must be gathered by 
those who desire to sell tlic test. The data which were collected in schools 
after the test was presented for sale, were intcj citing but were gathered 
under such varying test conditions and computed by such a variety of 
pcrsunncl thai it Is difncult to know how to interpret the niatter, 
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AdeqiiLito vulidity, reliability, and norniiiiy procedures should bGobtLiiiicd 
boforc the test is proseiUDd for sale. 

Summary 

The Lest appears to be a good predictor of readiness for kindergurten 
and beginning first grade youngsters. The test is probably must suited for 
middle class suburban children. In addition, the total test seore may serve 
with this population as a rough screening device of intellcGtua! I'unction- 
ing. With children ffom middle class communities outside of suburbia, the 
test probably will do equally as welL The test seoros should be interpreted 
with great care widi iower socioecononiic, rural, and southern areas. The 
writer strongly advises against the individual use of tlie subtest scores, as 
do the test manufacturers, particularly for Word-Maaning md Ustening. 

• The Murphy-Durrell Reading Readiness Analysis 
Overview 

The Murphy-Durrell Reading Readiness Analysis published in 196S is 
an outgrowth of the Murphy-Durrell Reading Readiness Test published in 
1949. Only one form of the test is available. Subtests include Phonemes, 
Letter Names, and Learning Rate. 

Students record their answers diroctly on t!ie test booklet, and a 
separate scoritig key is helpful in scoring the test. The directions to the 
examiner arc clear and concise, and the test format is attractive and 
appears to be easy for children to follow. The pictures and printing are 
legible and current, and there appears to be a minimal amount of cultural 
biasing because students do not have to name the items pictured. 

One will find tables in the manual for converting raw scores into sta- 
nines, percentiles, and quartiles. These tables include conversions for both 
the subtests and the total test score. The tables are easy to use, and the 
authors have provided a clear and concise description of how to interpret 
each type of score and how to plan instruction on the basis of these scores. 

Norms 

Use of the conversion tables is, however, somewhat limited because of 
the lack of a complete description of the standardization population. The 
manual states that several pertinent data items, such, as type of commu- 
nity, median income, and number of years of education completed by 
adults in the community, were collected and utilized in selecting the norm 
group; however, none of this information is included in the manual The 
regional distribution of the norming population is also quite uneven. 
Approximately six percent of the norming population is from the South, 
and that six percent comes from only one state. Under these conditions, 
one would probably find the test results more interpretable if one devel- 
oped one's own local norms ratlicr than relying on the publisher's norms, 
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Reliability 

Tlie roliubility of tiic test vvlis dutumiincd by rufuiuiuly selecting 200 
students fruni the norming population and uonipuling splihlmir cor- 
rclutions. Basicully, this prucodurc is tu divide the test in hull' by putting 
all the odd-numbered iten.is in one group and the uveii-n umbered itenis in 
unuthcr group; the reliability is then the agreement between tlie.se two 
halves. Using this procedure, tlie total test score appears to be qnile reli- 
able. The manual also cautions the test consumer to tliink uf a pupil's 
score us (ailing within a range of possible scores rather tlum at a particular 
point. This is useful advice for interpreting scores. The reliability evidence 
would be more usefui if a complete description of the 200 cases were given 
because reliability coclTicients can vary from population to population. 

Validity 

The manual suggests that pruspcgtivc test users exuniinc the lest to 
determine content validity, but the authors du not include a discussion of 
theii definition uf ''reading readinoss." Accurding to the American Psycho- 
logical Association's Standards for Educational and Psychologwal Tasis 
ami Manual, -\ . . the manual should indicate clearly what universe 
(content) is represented and how adequate is the samph'ng/' This standard 
is considered "cssentiul/' 

Predictive validity evidence For a rclutively small sample of 200 pupils 
fr oni four school sy stems in Kansas indicates that the Murphy-Durrell test 
given at the end of kindergarten is somewhat predictive of reading achieve- 
ment as measured by the Stanford Achievement Test— Primary I when 
given at the end of first grade. Approximately 43 percent of the perform- 
anee on the reading test was accounted for by the readiness test. The 
publisher has also indicated that additional predictive validity data which 
have been gathered since the publication of the test are available; but if 
this information is to improve the interpretation of the test scores, it 
should have been gatlierod prior to the publication of the test and should 
be included in the test nianuah The lack of description of the population 
or the reading program for these 200 students in. Kansas also liniils the 
interpretation of the validity data. 

Evaluation of Subtests and Items 

The three subtests arc somewhat different from the usual subtests on a 
reading readiness test. For the Phoncmm .btcst, the student is to select 
from four words those that bogni with a phoneme given by the examiner. 
The words are represented by pictures, but the item in each picture is 
named by the examiner. For the Letter NauiQS subtest, the child is to 
select from five alternatives the letter named by the examiner. Part one 
tests knowledge of capital letters, and part two tests knowledge of lower- 
case letters. The Lmrning Rate subtest is a measure of the number of 
words retained an hour after instruction. 
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All of the subiesis arc closely related to the actual ta*sk of leurning lo 
read, and, thercrore, tliey vvould seem to be quite useful for diugnosing 
students' readiness to begin reading. The test authors, h awe ver, fail lu 
include any discussion of other factors vviiigh should be considered by the 
teacher. The discussion of the use of the test results sccnis to indicate that 
the skills measured by the test are the only factors to be considered. 

For a random sample of 200 cases from the norming population, the 
split-half reruibilitios of the Phonemes and Lalrcr Names iQSt were quile 
high (.94 and ,97), but the Lmming Rate test had a reliability of only M: 
this result is probably partly caused by the relatively short length of tlie 
test It is believed that these reliability indexes are high enough for one to 
make separate use of the subtests. 

The use of the subtests fur predicting reading achievement is lower than 
the total lest score. Phonemes is the best predictor, and Leaming Rate is 
the poorest. The range of these predictions Indicates that from 1 5 to 58 
percent of beginning reading achievement is accounted for by the various 
subtests when they are considered separately. The authors suggest that the 
total score should be utilized in planning insiruction. Because of the lack 
of diagnostic validity evideneo die writ or would support this procedure 
and suggest onJy limited use of separate subtests. 

Summary 

The Murphy-Durrell Reading Readiness Analysis includes subtests 
which are very similar to the skills of beginning reading. However, the 
skills are related primarily to the decoding aspects of beginning reading, 
and it is, therefore, suggested that if this test is going to be used as an 
indication of readiness, other factors should be considered. The test is 
probably most nenful as a criteria measure of specific skills, and use of the 
test should probauly be based on one's classroom experience and the 
development of local norms. 
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ChaptfcT 3 



SELECTING A READING ACHIEVEMENT TEST 

PROBABLY more tests are udniinisterccl at the end uf tlic scliuul ycur 
thun the beginning, and this practice is a reflcetion of tlic purposes a 
toacher has in niiiid in using tests i tiiat is, most schools arc mure concerned 
with how much a child has learnod in a given year than in giving a test 
earJy to assist in planning instruction. 

Tests should be seiccted according to the desired purpose. If one wishes 
a quick overview of one's students' current status, adniiuistering an 
achicvciiient test early in the school year is'a desirable practice. If one 
wishes to evaluate the instructional prograni, udininisteriiig the achieve- 
ment test late in the school year is a useful practice. To detcrmiiie the 
speeific strengths and weaknesses students possess, a dlagiKJstic test or un 
achievement test with appropriate subtests should be selected. 

These reasons should guide one's selection of a reading test. One will 
want to know how accurately the subtests can be used us diugnosiie assess- 
ments, how reliable and valid the resuits are, and exactly how ilie test was 
nornied in order to determine whether the grade place men t, percentiles, or 
staiiinc scores arc appropriato for one's class, in essence, these points 
should be considered in clioosing the test. The charts in the Appendix give 
a quick overview of the niajor characteristics of the tests reviewed and an 
evaluation of each test. 

• California Reading Tests 
Overview 

The California Reading Tests arc part of a larger battery of tests called 
The California Achievement Tests (CAT). Tliesc tests have a long history 
and have been through several revisions since they first appeared in 1934. 
This review covers the current I 957 edition renornied in 1963. 

The reading tests are divided into throe levels; lower primary, grades 1 
& 2; upper primary, grades 3 & 4; and elementary, grades 4 through 
6, In addition, several forms of the test arc provided for each level It 
should be noted^ liowover, that this review covers only Form W ns the test 
manual and technical report do not include data for any other form. The 
test manufacturers rightly caution against using other forms for research 
purposes and clearly state that the standardization and norniailve work 
were done only for Form W. The pubUsIier states that critical users or 
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experimenters should use Furiii W . . since cungiiisivc evidence uf furm 
cquivuIancQ is not avuilablc" (page 24, CAT, lechnicai Manuall While 
there IS no empirical cvidonco that the furnis arc nut Dciuivalcnt, there is 
niurc importantly no evidence that the forms arc equivulent. Thus, all 
forms other than Form W are an anknown quantity and arc not rccuni- 
mended for use. 

Ail of tiic reading tests arc divided into two subtests, RcmJing Compra- 
hmsion and Vucabulary. Bach of these subtests is then divided into sog- 
tions. The nianual suggests that the sections were devised to ease 
administration and to reduce the time that it would take the child to 
complete the entire test. The section scores are nut to be used for grouping 
or instructional purposes. 

The California Reading Tost, Losver Primary level, is divided into two 
subtests: Rmding Vocabulary and Reading Comprahemiuti The Reading 
Vocabulary section contains four sections with a total of 75 items: Word 
Form, 25 items; Word Recognition, 20 items; Meaning of Oppositcs, 15 
items; and Picture Association, 15 items. The 75 items seem to be an 
adequate measure of reading vocabulary. Item 12, Test I, Section D, is the 
only item that needs to be questioned in terms of picture^responsc clarity. 
It is the rcvicwor's opinion that the illustration could be misinterpreted by 
examinees. However, this is a minor point 

The Rmding Qjmprahansion section is divided into two parts: Follow^ 
ing Directions, five items; and Interpretation, ten items. The itoms in these 
two parts are reportedly designed to measure skill in rollowing directions, 
noting specific facts, and making inferences. Although these three skills 
appear to be^what the authors are attempting to measure, it was difficult 
for a group of experienced reading teachers to ascertain which reading skill 
was being tested in any particular item in. the Reading Q)mprehcn&ion 
subtests. The test authors recommend di.at the section scores be used as 
Indicators of areas of reading di^bilities. However, the length and relia- 
bility of the various sections are such that attaching much significance to 
the vocabulary or comprehension section scores individually should be 
avoided. The various sections appeared to be controlled for readability. 
The items are arranged from easy to difncult in each section, and content 
of the test is current and doos not appear to favor specialized backgrounds, 
A separate section, Letter Recognition, is included at tlie end of the 
regular test and is to be used with those students who obtain very low test 
scores on the Rmding Vocabulary and Rmding Comprehension Tests. The 
Letter Recognition section contains 24 items and is designed to help the 
teacher gather additional information on general performance with verbal 
symbols. This section requires that tlie examinee identity alphabetical 
letters in their capital and lower-case forms. The child indicates whether 
the words joined by a dotted line are the same or different. The two words 
may or may not appear in the same printed form. This particular section 
could yield valuable information on certain word recognition skills for 
poor readers. 
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For llie most purl, the iianiys of the Rcaiiing Vombidary scctiuns ade- 
quately describe whui each section is aitenipiing to measure. In the 
writer's opinion Reading Comprehension dues nut adequately descrilDC ilic 
nature of the tasks in tliis section of tlic test. 

TIio California Reading Test, Upper Primary luvcl, is divided irito two 
tests: Reading Vocabulary 'iiu^ Reading Comprehension, The names of tlic 
Reading Vocabulary Reading Comprehension tests are adec|uate for a 
general description of llie task involved, Reading Vocabttlary coniuins two 
sections witli a total of 45 items: Word Recognition. 20 items, and Mean- 
ing of Oppositcs, 25 items. Reading Q)mprahensian coniuins tlirec 
sections with a total of 51 items: Following Directions. 15 iteins; Refer- 
ence Skills, 15 items: and Interpretation of Material, 21 items, A separate 
Ward Form test which consists of 25 items is included at the end of the 
regular test for diagnostic purposes and is to be used only with those 
students obtaining very low scores on the total reading lest. The Ward 
Farm test consists of pairs of words wiilch the exauiinec is to mark us 
same or different in appearance. In a very limited way, ilie Ward Form test 
provides the examiner with some diagnostic information in the word 
recognition skills area. The test authors claim that the principal value of 
the section scores is their indicution of existing weuknessos. However, the 
sections are relatively short, and the attachment of much significance tu 
individual vocabulary or comprehension scores would not be warranted by 
the reliability coefncients. Readability appears to have been a considera- 
tion in the construction of the various subtests, hem progression in each 
subtest is from easy to difficult. The content of the test items is current 
and does not appear to favor any particular background of experiences. 

The California Reading Test, elementary level, is divided into two parts: 
Reading Vocabulary and Reading Cvmprehenston, Mathematics, science, 
social science, and general vocabulary sections are included under Reading 
Vocabulary, Following directions, reference skillSj and interpretations 
sections are included under Reading Q?mpMwnsion, The tests names do 
adequately describe the task In each of the sections. Fifty vocabulary 
items are used for the four vocabulary sectionSj and 60 are used for the 
coniprehension subtests. The reviewer noted that experienced teachers, 
when asked to do so^ had difficulty classifying the 50 vocabulary terms 
into the categories of mathematics, science, social studies, or general 
vocabulary, A similar ^oup of trained reading teachers disagreed in labeh 
ing the skills the authors claimed were being tested In. various itenrrin the 
Rmding Qn»/w/ie»o/?-^cctfon^ authors caution against attach- 

ing undue sigmTicance to the scores on separate sections but go on to state 
that the principal value of the scores is their indication of existing weak- 
nesses. There is no evidence cited to support this clainij and the writer 
recommends using the scores with caution. The sections do appear to be 
controlled for readability, and the items are arranged from easy to 
difficult. The item content is current, and favoritism for specialized 
backgrounds has been avoided in item development. 
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TUq adniinistruiion ui the Califurnia Rcudiiig Tests, ull levels, appears 
nul to be tliffieult. The dirceiions and c?<uniplcs are clcur and concisu. All 
the subsections arc timed tests; therefore, a stopwatch ur cluck with a 
second hand is needed. Suoring time is less rigid fur the Hjementary level 
lest. Students may mark their answers on scpariite answer sheets, such us 
IBM, ur use SCORliZli, a sclf^scDring answer sheet uvyilable thruugh the 
nrnu IBM answer sheets nioy be less expensive and can be scored rapidly 
by nuichinc or using hand scoring overiays. 

For the Lower and Upper Primary level tests, students arc required to 
niark their answers in the booklets, and these must be scored by hand. 
When scoring any test by hand one is cuuiioncd nut to make a diagnostic 
interpretation of these subsections. These scores are not to be used in this 
way. and niully interpretation of a student's skills can result from iiiler- 
preting individual items. 

Grade placenient, percentile stanines, and standard scores are pruvided 
r9r each test and total score. All tables iiru clearly identified and cusy to 
use. 

Althouyh the manual cautions thai the test sections are not nornied and 
no grade placement simuld be attached to these scores, there are colunms 
on the profile sheet lor obtaining these scores. The technical report right^ 
fully cautions their use. One suggests that the section scores should not be 
convened to grade eqiiivaient scores and thai the test nianufacturers 
remove those columns from the proHlc, 

Although a Diagnostic Prol^ile Is provided, one dues nut suggest its use. 
Data are offered in the ^technical report and the manual for the tests us 
general reading achievement ineasures, and the power of these tests to 
diagnose specific reading difTicultics has no! been denionstrated. Until 
such evidence is available the writer suggests only the use of the three 
scores==reading vocabulary, coinprchension, and total score as a nicasurc of 
general reading achievement. 

Norms 

The normiiig population for the Californiu Acln'cvcmcnt Test Primary, 
Upper Primary and Bltnientary, 1957 edition, was extensively cotitrolled 
for geographical and instructional program bias. The renorrning of the test 
in 1963 took a more limited sample but appears to be quite adequate in 
number of pupils and range of abilities. The technical report discusses the 
norniiny extensively and presents cautions in interpreting individual scores. 
As in other reviews, the eslablishmenl oflocal norms is suggested. 

Reliability 

Split-half reliabilities arc reported only for the two subtests and total 
scores for each level. The reliabilities reported arc for one grade level for 
each test; that is, \ j for Lower Primary, 2,7 for Upper Primary, and 5,1 
for the Elementary, 
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Tlie Lower Primary level rollubiliiy fur Vovabiilary is siimcieiitly high, 
but llie Rwdin^ Oiniprdwnsion h jl, suinewhat lusvor iluni desired. The 
toUil suuro is -M- Tfie Upper Priiimry uiui UlciiienUiry level reliiibiHties ure 
very utrcepiublc. 

Rcliubihiies fur eauh ^rade level ratlier I ban lliusc reporlcd wuuld be 
desirable. Due tu ihc fact that the Lower Primary is Ibr firsi grade and 
l)eginniiig scuund, a sanijile ur 1.7 grade level is adeLjuaie. However, the 
Upper LlciiiuiUary is designed lur rourtlu rifth, and sixth grades, and reli- 
abilities arc unly rupunod Ibr grade five. It h assumed Ibiirth and sixth 
grndes would have similar reliabilities, but one would like iheso data 
reported. 

The Uppcf Prinuiry test-reicHl rcliabililies reported for 90 students at 
grade placemonu of 2-B and 3,8 are lower tlum one would desire for the 
reuding comprehension seyiiun (.59), The vocabulary and total score lest^ 
retest reliabilities are sufficiently high, 

Tcsi-rctesi reliabiliiies are reported tbr 90 students in grades 4-8 and 
5-8 and 1 25 students in 5-8 and These are quite high and well within a 
desiruble range. 

In addition, data are reported for all three Icvuls oivicst^roiesi reli- 
abiliijcs of siudcntji who took the lower form and were then tested on the 
next highest Ibrm. These are all adequate uxecpt Ibr the correlation of the 
Reading Cginprehcnsion section of the Upper Priminy to the Ulenicntary 
Reading test of comprehension where the coeinuient us reported is ,54. 
Why the Upper Primary Reading Comprehension tcst-rclcst reliability for 
the same form or next highcsi Ibrm is so low is not made clear. 

The lower reliability may be a product of rapid skill dgvclopmcnt of 
children at this age or it may retlect program changes in (he teaching of 
reading. Compreliension skills tend to be stressed by teachers at the end of 
sccotnl gi^ade and more in third grade. This score may be a redection of 
changes in program emphasis, hut this is only a guess. One should interpret 
this score with other infornuilion available from tlic classruuni program, 
such as the pupils' daily rending assignments and independonl rciiding 
activities. 

Validity 

Validity fur the series is reported in two ways: 1) by liuvinga teani of 
experts inspect all itenisand 2) by correlating the CAT with other achieve- 
ment tests. 

The '^exports'' who examined the ilems generally agreed that the items 
included in the test were ones that were csseniiai concepts or of major 
importance, As previously suggested, any test nuist be examined by the 
tcaclier to see if it niatclies the content as tauglit in his classrooni. Exami- 
nation of the lest leads one to agree that items on the test arc important 
aspects of a basic reading program. 

Listed in the manual, a table called ''Diagnostic Analyses of Learning 
Difficui ties'' breaks the items into groupings wiihui the test. One should 
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examine this table to clclcrminQ agrcemenl wiih what is included. The 
wriler cun find no enipirica! evidenco lo suggest these skills are being 
nieasured: however, no criticism of the test authors is made (or this lock as 
an achievement test must sample from a range of skills. Oi?e does caution, 
however, tliut evidence to support the test as a diagnostic instrument is not 
given and further, that the manuai and technical report caution against 
using it so. There is no evidence offered, for example, that the niathc- 
maiics vocabulary is a reliable or valid sample of mathemutics vocabulary. 
This statement can be opplicd to any other of the subsection scores. Their 
inclusion on the profile may niislead one into confidence that the child has 
maslered these skills. These scores are, as the test authors suggest, only 
''cues'' for one to verify with other data. 

The correlations of the CAT with other reading achievement tests are 
high but none are reported for the Losver Primary. 

Extensive correlational data among the subtest scores and the California 
Short Form Test of Mental Maturity, 1963 revision, are reported. These data 
can be interpreted as validity of the test. Again, Lower Primary Reading 
Comprehension scores correlated much lower than the othcf'subtests. 

The writer questions, however, that the correlation between the 
achievement test and the Mental Maturity Test should be this higli The 
correlations as reported suggest that either the Mental Maturity Test js an 
achievement test or that the California Reading Test is a mental maturity 
lest. The correlation of the Upper Primary and Elementary Reading total 
score and the CTMM-short form are -79 and .8 1 , respectively, indicating 
that these two tests measure somewhat the same skills. 

Summary 

The California Reading Tests span the elementary level nicely, are easy 
to administer and use, and are in general reliable and valid measures of 
reading behavior. One will have to inspect the content of the test to 
determine how closely it matches one's claisroom instructioih The Lower 
Primary Comprehension test is not so reliable as one would desire, but the 
three scores, (vocabulary, comprehension, and total) for each test are of 
use as a measure of general reading comprehension. 

The major criticism of the test is that the section scores on the profile 
sheets and the Reading Diagnostic Profile have not been denKmstrated to 
be reliable or valid measures. Although the manufacturer cautions against 
the use of these scores, their mclusion, it is believed, can be misleading. 

Only information on Form W is provided, and the writer suggests that 
this is the form to be used until information is provided by the publisher 
for the other forms. 

• Gates-MacGlnitie Reading Test! 
Overview 

The Gates-MacGinitie Reading Tests are a new edition standardized in 
1965 and developed to replace the Gates Primary, Gates Advanced 
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Primary, and the Gutu^ Reading Survoy. liiciuded in iliis scven-tesl series 
Lire tests fur all grudc levels, rrum firsi to Iwcirtli. Tliis review will cover 
only the five icsis from grade unc up lu and including grade six. 

These five tests arc Primary A fur grade one. Primary B fur grade 
Iwu, Primary C for grade three, Primary CS fur grades two und diree, 
and Survey D ^(ur grades iuur through six. Primary A, B, und C und 
Survey D iudude siibiests f\)r Vocabulary and G}mprdwmiun: in 
additiun Survey D has u Speed ami Acctmity subtest. Prlnuiry OS is a 
Speed ami Accuracy lest Ibr grades tv^u and three, Twu lurms (if the test 
are avaihible for Priniury A, \L and C: throe rorms are avuiluble ft)r PrinKiry 
CS und Survey D. 

The scoring of all the tests is aided by scoring uverluys. Tables in each 
manual arc provided lu aoiivcrt sublesl and lotal icsl scores to grade 
norms, standard scores, and percentiles. There are tables Ibr begimiing, 
middle, and end^u^the-year testing limes Ibr each grade except fir^. 
Middle and cnd^or-thc^ycar norms arc available for first grade, These tables 
were developed by norming the lest ut the beginning and end uf each grade 
and then interpoialing to esiiniate middleH)f=the^ycar norms. This'pro^ 
ccdurc is far superior to the usual praclice uf administering a test at only 
one lime during the year. 

in the Icehnigal manual the authors also provide tables for intcrprgling 
differences between sublest scores and also for evaluating ditTerencns 
between scores in estiinaling reading growth. The eompurisons of subtest 
scores arc based on slandard scores. 

The educational signincance of the differences of these scores is dotcr^ 
mined by ihe probability that two scores would differ by a ccriaiii amouni 
fifteen times out of a hundred, if these subtest difTerences are apt to occur 
more than fifteen times out of a hundred, this information is considered to 
have cducationai value in planning a reading program. Formulas arc also 
provided for determining the significance of average subtest score diflcr^ 
cnces for groups uf children. 

The tables for evaluating the significaiicc of reading test gains also 
utilise standard score differences, and these differences are again con^ 
sidered significant only wlicn ihey are apt to occur more often than fifteen 
limes out of a hundred. The use uf ihese tables will be very beneficial in 
interpreting the test scores. The technical development of the tables 
appears to be very satisfactory, and the suggestions for their use by the 
test authors arc excellent. 

For all of the tests, the toial reading score is determined by averaging 
the standard scores of the subtests. This average standard score can then be 
converted to a percentile or grade score. The test authors correctly point 
out lhai, when determinijig averages, it is not good pfacticc to add and 
divide raw scores bQcauso they arc not based on an equal-interval scale. 

The Gates^MacGinitic Reading Test, Primary A for grade one and Pri- 
mary B for grade two arc quite similar. Each has two subtests, Vambidary 
and Q)mprehmma}h The Vocabulary subtest has 48 items. The student is 
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10 niaiuli ii piciure witli the word il represents; ibiir tihernuiive responses 
are provideiL Suine uf tlic iicnis uppeur to be incasuring visiml disuriiiiina- 
tiun of words, and uiheis seem lo measure ilie student's ubility to deter- 
niine the nwanin^ of tlie piuturc. The uuihor^ suggest ihat the test 
nieusures ability tu recognize isulated words. The pictures arc clear u^d 
up-to^date, and tliere sccnis to be a minimal umuuni oreultural biasing in 
the seleetion of items. 

The Cufuprdwnsiim siibicst measures the student's ubility to read and 
understand wliojc sentences and puragraphs. The studeiit is to fiiutch ihe 
sentence or puragraph to one of (bur pictures. For some of tiie items, it 
nppeiirs as thuugii ilic student could detcrniinc the correct response from 
reading unly one or two vvords in ihe selection. Because gf this eondltioii, 
it is probublc that the test is not meusuring a much different ability than 
the yocahulary subtest. The reported correlations of these subtests (,67 
lur Prinmry A and ,78 for Primary B) would seem to support this 
cuniciition. 

i'riniury C for grade three fullinvs the same putiern us Priniary A and B, 
However, the Vocabulary subtest has a total of 52 items; for 12 of these, 
the student is to match the correct word with a picture, and for the 
remainder of the items he is to select the best synonym for u stimulus 
word. The Cumprehmmm subtest ineludes 24 paragraphs each of which is 
luiluwed by two multiple ciiuice questions. Some of these questions ask 
students Ibr meanings of words in the paragraphs and, therefore, as might 
be expected, the correlation of the subtests as reported in the technical 
niunuul is .83. 

Primary CS Ibr grades isvo and three is a test of reading speed and 
aceuruuy. There arc u total of 32 items on the test. lEach Item includes a 
short paragraph and a five-optian muhiplc dioice question or incoiiiplete 
statement. The studDnts are given a total of seven minutes to work on the 
test. Two scores, nunibcr attempted (speed) and number correct (accu- 
raey), are determined for eoch student. According to the publisher, the 
accuraey score cofrelatcs J8 with both the Vocabulary and Qjmprc^ 
Imisiun .subtests for Primary C. The speed score correlutcs 34 and .53 
with Cumprehmmun and Vocabulary respectively. From these correlations 
It uppears that the aecumcy score is nieasuring the same set of skills as the 
}^jcabi(lary and Q)mprdwnsion tests arc but the speed score appears to bo 
nieasuring u different variable. 

Survey D Ibr grades four, five, and six includes Vucabukry and Qjm- 
l^'chansion subtests and also the Speed and A ecu racy subtest, similar to 
Primary CS. The Vocabulary subtest has 50 items which measure the 
student's ability to choose tlie best synonym for a stimulus word. Each 
word is presented in isolation, and the student is to choose the correct 
response from five alternatives. The test is timed; however, it seems likely 
that most students should be able to complete the test in the 15 minutes 
allowed. The Comprehemion subtest consists of 21 paragraphs in which 
ihere are a total of 52 blanks. For each of these blanks the student is given 
five alternatives to choose from in selecting the word that best fills the 
blank. Students arc allowed only 25 minutes to work on this subtest. The 
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speed ami Accimicv subiosi iullows lUo sunic ruinnii us Prinimy C'S. 
However, ihcrc arc 36 itcnis. and the stiuicnls lire givuii live iniiiiite*^. i hc 
use of u time limit far the Spacci and Acvuraty sub test is, of course, 
necessary; but it docs not sceni to be a dclensiblc procedure on liic 
Cvmprehemion und Vocalndayy subtests where the attempt is to measure 
rcuding power. While it seems proliublc that scores on the V\)cafndary mA 
Qjnipreiwfmon subtests would not vary siuniriuiuitly if more lime wore 
ailowed. evidence of iliis condition should be provided by tlic test 
publislier. 

The directions for all of the tests and the various subtests arc clear and 
concise. The subtests rollow a logiual pattern of increasing dilTiculty of 
items. Some test authorities have suggested that it is better practice to 
intersperse difficult items with easier Itunis, but there is contradictory 
evidence as to which is the better practice. The use of dilTcrenccs in 
subtest scores should be inlerprelcd cautiously; the tables provided for this 
purpose in the technical manual should be utilized. The high correlations 
bctwccii tlic Vocabulary and Qimprehcnskm subtests indicate that these 
subtests are nieasufing quite siniilar traits; it also appc^ars that tfic Speed 
and Accuracy subtest is measuring a somewhat different trait. 

Norms 

Tfie tests were normcd on 40.000 pupils in 38 con.in.iunitics. Because of 
the number of lest levels und test forms it is probable that each test was 
normed on about 2,500 pupils. Tlie authors state that the communities for 
the norming populatiun were selected on the basis of size, geographical 
location, average educational level, and average annuai iiiconic. Despite the 
fact that these variables were allegedly controlled, the authors do not 
describe the population. The nornis can be cautiously accepted as being 
representative of national performance; however, for a more precise and 
meaningful intefprctation it would be best to develop local norms. 

Reiiability 

Reliability indexes were computed by both the split-half procedure und 
the test-retest procedure utilizing different forms of the test. Reliabilities 
are reported for each subtest at every grade level. These reliabilities arc 
based on testing in five separate communities, but these conununities are 
not further described, A more complete description uf these communiiics 
is vital in interpreting the reliabililies for one's classes. In general, the 
reliabilities are high enough for one lo feel fairly certain that the score a 
student will receive on one form of the test on one day is likely to be the 
same as the score he receives on another form on another day. Tlie alter- 
nate form reliabilities W/i the Vocabulary and Ojmprclwmion subtests 
range froni »8! to .89, \.\\^ ihe Speed and Accuracy subtest only ranges 
from ,67 to .86, Tjic splii-half reliabilities for Vocabulary and Compre- 
hcmkm runge from .88 to ,96, The split^half reliabilities for Spml and 
Accuracy arc not reported because of the problems of correlating alternate 
halves of a stringently timed lest. 
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Validity 

Validity evidence for the tesl Is very limited. The test appears to have 
face validity (or measuring whal it purports to measure: and. accurding to 
the test authors, the items were selected on the basis oT a tryout with more 
than 25,000 pupils. There is, however, no descripti^in of the curriculum 
content which this test is supposed to be measuring. There is also no 
evidence that the subtests were selected by exuniining the content of 
reading programs. However, the authors' accepted expertise os specialists 
in the area of reading behavior somewhat diminishes this criticism. As with 
other tests, it is suggested that if one decides to use any of these tests with 
specific students, one should carefully examine the objectives of one's 
reading program and compare these to the content of the test. 

Correlations between Survey D subtests and Lorge'Thorndike Verbal IQ 
scores arc reported for grades 4, 5, and 6, These correlations indicate that 
for all of the subtests the similarity between the vocabulary and the verbal 
IQ scores become clqser at higher grade levels. In addition, it appears that 
vocabulary and comprehension scores are more related to verbal IQ than 
are speed and accuracy scores. These correlations lend support to the 
general conclusion that tliere is a great deal of similority between group 
measures of verbal IQ and group measures of reading aehievement. Most of 
this similarity is probably due to the amount of reading that Is necessary 
on a group verbal IQ test 

Summary 

The Gates-MacGinitie Reading Tests provide a measure of general read- 
ing achievement for students from grades one through twelve. Only those 
tests used in grades one to six are included in this review. In general, the 
tests are well constructed, and the authors have provided a useful pro- 
cedure for interpreting the differences between subtest scores. This is a 
welcome trend in the developmejit of reading tests. 

The tests have been nornied at both the beginning and and of each 
grade, and the subtests and total test scores are quite reliable. One shoi'ld 
certainly examine the test's validity for measuring the objectives of a 
specific program by comparing the program objectives to the test object- = 
ives. The development of local norms also would aid in the interprdtation 
of test scores. This test series appears to be one of the better instruments 
available for measuring the reading achievement of students. It should be 
useful for evaluating growth, screening students who are in need of more 
diagnostic testing, organizing instructional groups, and cautiously diag' 
nosing subskill deficiences. 

• Iowa Silent Riding Test 
Overview 

The Iowa Silent Reading Tests (new edition) are available in two sepa- 
rate testSj one for grades one to eight and the other for grades nine to 
thLrteen, This review will consider only the elementary level of the test, 
Many of the shortcomings of the test are due to its age. The test booklet 
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was published in 1942. and iliu laiesl lust marniul copyriglil clalu is 1043. 
Tlicrc arc lour foriiis of tlic test availnble. liiglii sepurate subicsts i\yq 
includod; Ra!c% Cumprahaimon, Directvd Rmiing, W()nl Meaning, 
Paragraph Qjmprafwnsion, Senicnce Meaning, Ijfcation of Information 
Alphabetizing, and lAjeanun of Information- Use of Index, 

Hand scuring of the test is suiiicwhai difficull beguiisc the lest booklet 
must be lurned upside duwn in order tu score sonie of the ^ubtesls an 
unnecessary compjicution for pupils taking the lest. Because of the hand 
scoring difficulty, it is suggested that if one uses this test, one shuuld 
utilize ihc niacliine^scorcd answer sheets which are available from the 
publisher. 

Raw scores must be converted to standard scores before being con- 
vprled to percentiles, grade equivalents, and age equivalents. The total 
standard score is detcrniined by computing the median standard score for 
all the subtests. A profile is printed on the front of the test booklet fur 
eompariiig subtest standard scores. It is recnnunended that one docs not 
use this profile for diagnosing of students' reading abilities for two 
reasons: 1) the norming population is quite inadequately described and 
one would be comparing one's students to some unknown group; and 2) 
the reliabililies of several of the subtests are quite low. For example, the 
repofted split- half reliability of the comprehension subtest is .68 for grade 
six students. 

The Rate subtest has two serious weaknesses. First, the students arc not 
given any purpose for reading the material; they are told only to read 
carefully so they can answer questions about the story. The writer's belief 
concefning reading-rate tests is tliat the test should measure how rapidly a 
student can accomplish a specific purpose. Secondly, the directions state 
that the student may not look back at the selection to answer the 
questions, a itipulation meaning that the test is very heavily loaded with 
an immediate memory ability. 

The Directed Reading subtest appears to be an attempt to measure the 
student's skimming ability. However, the use of formal and typogfaphical 
aids in the selection would greatly increase the value of the test. The Ward 
Mesning and Paraffaph CDrnprehemion subtests follow the traditional 
pattern of utilizing words In isolation and multiple choice questions 
following a selection. Use of these subtests should be based on an analysis 
of a reading program and the content of these subtests. 

The Sentenee Mmning subtest appears to be measuring knowledge 
other than reading ability. For example, one statement asks, '*Do most 
children attend the public school in the summer iLme?'' 

The Alphabetizing and Use of Index subtests are designed to measure 
reading-study skills. It does seem that if the measure of reading-study skills 
was desired, the authors should have included measures of other skills such 
as use of the Library and using parts of a book. 

Norms 

Norms for the test were gathered in the spring of 1942 and are based on 
9,000 pupils in '\ , 19 communities in 13 states widely distributed 
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geographically/' Due to the very limited description of the norming i7dpii- 
lutiun and also because the test was normed uvcr 25 years ago, there is 
absohitcly no'validiiy for these nornis and one should not use them under 
any clrcuinstances. The test could be used for conipariiig student growlh if 
one wore to develop local norms. 

Reliability 

As indicated previously, the reported split-huir reliabilities for niosi of 
llie sublcsis arc loo low to be sure iliat a student's score will not vury 
considerably Ironi day to day. The total reading score, which is the median 
standard score for all tlie subtests, is niore rcHable, Mowcver, because the 
reliablliiies arc based on a poorly defined norming population, an evalua- 
tion of the reliability of the tost is very difficult. 

Validity 

The test has becji defined by the authors as a reading-study skills test, 
and the makenip of tlie subtests appears to have face validity. The develop- 
tnent of the outlhic of skills for the test was based on a textbook, Mmwe- 
nwnt and livalmtkm in the KImnentary SchaoL which one of the authors 
of the test coauthored. The only other vutidity evidence Ibr the test is the 
report of a sniull study which indicates that most of tlie sijbtestsare only 
minimally related to one another. This (actor would aid in the diagnostic 
use of the subtests if 1 ) the subtests were more reliable and 2) the popula- 
tion for the iryout consisted of a better described sample at more than one 
grade level. 

Summary 

The Iowa Silent Reading Test-Elementary Edition will be of only very 
limited value for use with one's classes. The skills measured by the test 
appear to cover a brqadcr range of skills than most elementary reading 
tests. But, the inadequacies of the norms, the antiquity of some of the 
items, the lack of validity evidence, and the limited reliability of some of 
the subtests should cause one to reject It, 

• Metropolitan Achievsment Teits-Reading 
Overview 

The reading tests of the Metropolitan Achievement Series are part of a 
larger battery of tests which span the last half of first through the ninth 
grades. This review will cover the reading tests of the first through sixth 
grades. 

The scries dates back to 1932 and has boon revised several times, the 
most recent being the 1959 edition wliidi is the one used in this review. 
There are four levels of tests covered hi this report: Priiuary 1 to be used in 
the latter half of grade one or beginning grade two, Primary II for use in 
grade two, Elementary for use In grades three and four, and Intermediate 
ibr use In grades five and six. An advanced form (for grades 7^9) is avail- 
able also but is not reviewed hero. 
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There arc three Ibrms available Ibr Primury I and 11 and (our rurnis lur 
the uUier levels. However, neither the "Manuul fur Interpreting" nur tlie 
^'Directions for Administering" present data tu Indicate Ibrni compura- 
bility. A separate summary is uvailablc fruni the pubHshers rcporiing 
correlations between forms A and U from a single school distriei. These 
intercorreluiions arc high Tor all levels uf the reading tests. The Muniual 
does nui make it clear whctiicr tlic other Uaia presented^ fur exaniplc, the 
spht-half reHabilitigs are for ulJ forms of the test or for just one. It is, 
Ihcreforc, suggested that forms A and B appear to be comparable. They 
may be used with conndcnce until such time as other data are reported. 

Grade placement, percentiles, and stanincs arc provided for all levels, 
and instructions on how to coniputc local stunine^^are given in the Manual 
lor Jnterprcting, The Directions for Administering arc short and easy to 
read, but ihcy do not give sufficient inrormalion for evaluating ihe tests. 
Both documents arc needed to evaluule the tests. 

For each level there is more than one test, Primury I and 1! and lile- 
mcnlary contain Ward Kfiawlaciga, Wurd DisLrimuwiUm and Rtmling 
Tests subtests, while IntcrmeUlate contains Word Knowhdi^c mdRLmiing 
Test subtests. The Primary 1 battery is divided into a 35-iteju Word 
Knfnvicdge test which takes 15 minutes, a 35-i(em Word Dkcrinunatkm 
test which lakes 12 minutes, and a 45^^1iem Reading Tasr which takes 35 
minutes. 

The directions are clearly written and easy to follow. A watch or clock 
with a second hand shuuld be available for the timed sections. The test is 
designed to meaiurc orally presented words, student sight vocabulary, and 
the studont's ability to comprehend sentences and paragraphs. The titles 
seem descriptive of what is required. Rmdiug Cnmprehenstan refers to a 
limited number of skills, and unc should examine the test as the lest 
manufacturer suggests, to determine whether the skills taught in one's 
program are being measured. 

One sample item is given foe each test. Pupils who have limited test 
eKpcrlence may liavo dirficulty following directions with only one 
example. However, the roliabllities seem sufficiently high to indicate this 
problem may not occur. 

The pictures arc clear, and one-line drawings arc such that they should 
mii/iimize the student's bcconiing confusod by too complex pictures. The 
content appears to be curfent and not to favor a special population. One 
study provided by the test publisher did indicate, however, quite low 
tcst-rctcst reliabilities (.67) on forms A and B from June to September for 
Negro boys on the Reading TusL These data may be influenced by the 
sumnicr lapse. However, interpretation of results with this group should be 
done with cure. ... 

The publishers point out that the norining was done witli groups of 
pupils of Liveraie and slightly above avcfagc mental abilities. This fact 
should be taken Into account in Interiiretlng student scores. 

The Primary II Reading tests contain a 37Mtcm Word Knowledge Test 
which takes 18 minutes, a 35-ileni Ward Dmriminathn test which takes 
12 minutes, and a Sl^iXQm Reading Test which takes 35 mimites. 
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The Word Knowledge subtest is divided into two sections; the first asks 
the child to select the correct word to match a picture; the second requires 
the child to complete a sentence witli four Lilternatives. The example queS' 
tion is for the first section, and no example is given for the second section. 
This omission may cause some confusion among young children taking the 
test. The directions appear to be easy to follow; the lino drawings are clear 
and current, and the items are spaced well on the page. 

The Word Discriminaikm subtest attempts to measure auditory and 
visual discrimination ability. The teacher pronounces one word, and the^ 
examinee is directed to find the graphic representation of that word 
among the four alternatives provided. To make sure the examinee does not 
misinterpret the stimulus word, the examiner presents the stimulus word 
in oral context. The stimulus words and alternatives were chosen carefully. 
Words containing a variety of consonantSj vowelSj blends, and digraphs are 
represented. Format^ appearance^ type size, directions, and time allotment 
are adequate for the population intended. 

The third subtest^ Rmding, is divided into two par t$= sentence reading 
and story reading. There are 13 items in sentence reading, with each item 
containing a line drawing and three sentences. The examinee demonstrates 
his understanding by choosing the sentence that best describes the line 
drawing. The line drawings are the same size as those used in the Word 
Knowledge test The validity of the sentence reading section of the Read- 
ing subtest must be questioned until evidence is offered as to what this test 
is measuring. All tests must select from the range of skills involved in being 
Lble to read; the writer does not expect a general reading test to measure 
everything. He does ask, jiowever, that evidence be offered to support the 
contentions that the publishers make about their tests. The story readiiig 
section of the subtest contains 10 passages; each is followed by a number 
of multiple choice questloni. The manufacturer states that the questions 
test main ideas, details, inferences, and specific word meanings. No ques- 
tions were asked on skills relating to organizational ability. In the 
children's score box for the test, Xh^ Reading test is divided into scores for 
Sentences, Stories and TotaL No evidence exists to support converting 
these section scores into grade equivalent, percentile, or stanine scores. 
Only the total Reading scores should be used, and it is suggested that the 
manufacturer remove these columns from the score box. 

The Metropolitan Achievement Test, Elementary Reading Test^ con* 
tains two subtests: Word Knowledge mdRmding, The first subtest title Is 
descriptive of what specifically is being measuredj but the second is too 
broad to serve much usefulness as a subtest title. Word Knowledge consists 
of 50 items^ each of which is compoied of a single vocabulary word placed 
in partial cintext and followed by five words, one of which correctly 
completes the context. 

The vocabulary wordi being tested appear to be carefully selected and 
representative of third and fourth grade children's vocabularies. Each of 
the four alternatives appears to have been carefully chosen and matched 
within each item; The subtest progresses from relatively easy words to 
more difficult words. The directions and format are easy to rollow and 
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ihould not be confusing to either Qxamlnor or examinee. The time limit of 
15 minutes for the 50 items in the subtest appears io be realistic in tornis 
of third and fourth grade children's reading ability and speed. 

The Reading subtest consists of nine passages, each of which is followed 
by u number of multiple choice questions. The questions appear to be 
measuring main ideas^ details^ inferenceSs and specific word meanings. No 
questions were asked on skills relating to organizational ability. Because 
the subtest ilile Reading is so broad, it Is difficult for the reader to 
determine what should be hicluded in this subtest In order to adequately 
appraise the validity of tlic subtest. The time limit of 22 minutes for 
reading the nine pasiages and answoring the 44 related questions appears 
to be adequate in terms of the reading speed of third and fourth grade 
children. 

Readability of passages and questions seems appropriately controlled. 
The nine passages are logically arranged from easy to difficult. The ques- 
tions are clear and concise: however, those questions testing independent 
word meaning could be improved. The examinee is asked to select an 
appropriate definition for a word used in the context of the itory, but the 
word in the story is not highlighted to facilitate localing the word by the 
examinee. Since the word is not highlighted, the examirfee must use 
precious time in skimming tlirougli the passage to find the word. The 
directions and format arc easy to follow and should not be confusing to 
cither examiner or examinee. The content of the items is not datcdj nor 
does It appear to favor specialized backgrounds of experiences. Both sub- 
tests appear to be long enough to provide the examiner with usable results. 

The Metropolitan Intermedlote Reading Test has two subtests. Word 
Knowledge and Reading. Only the title of the first subtest is adequately 
descriptive. The Word Knowledge Test contains 55 items. Each item is 
designed to test the knowledge of a word judged to occur frequently in 
children's reading materlah Each word selected for the test is presented in 
a minimal context. The item is completed by the examinee by selecting a 
single word from five alternatives* Alternatives in each item are carefully 
matched. 

All items seem well chosen with reasonable alternatives. Fourteen 
minutes are recommended for completion of the SS^itcni test. This time 
limit requires the completion of about four Items per minute or reading at 
approximately 56 words and symbols per minute. Neither of these sectns 
unrealistic for fifth or sixth grade children. 

The second subtest^ Reading, consists of seven passages each of which 
has been carefully graded by controlling vocabulary, sentence length, sen- 
tence structure^ and overall length of passages. Each passage is followed by 
a serioi of multiple choice questions. The analysis of the qiiestijns made 
by the reviewer indicated that the questions attenipted to get at mahi 
ideas^ detullSj InforenpeSj and individual word meuniiigs almost pxcluiiv^ly, 
The most noticeable of the missing questions were those attempting to get 
at orpnizational ability. Twenty-five minutes' time is recommended for 
the completion of the seven passages and 44 related questions. Approx- 
imatcly 2,000 words and symbols must be read during the 25 minutes, a 
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task which reqiiires li reading speed of approxiimtely 80 words per 
ininule. This speed should not be too demanding of fifth or sixth grade 
students. Both subtests are arranged in a simple-to-difficult order, and 
both appear long enough to provide reasonable results. The length of the 
two subtests \n the Metropolitan Irtterniediate Reading Test is at least 
coniiiicnsurate witli siiiiilur subtests found in other reading tests= 

The dircctiuns for administering and scoring the Metropolitan Inter- 
mediale Reading Test arc concise and complete. The clarity and language 
level are appropriate for the grade levels intended. The use of color for 
underlining key vocabulary wordi in the Word Knowlaclge subtest, item 
numbers, and distractor iiumbers further facilitate understanding and ease 
of adniinistration. The format, print size, legibility, and currentncss of the 
iteni content are adequate for the grade levels intended. Readability seems 
to liavc been carerully controlled throughout the test. Some of the answers 
to the multiple choice questions appearing in the test booklet are too 
crowded for efncient use of the hand scoring key provided. The liand 
scoring key consists of pieces of cardboard with holes punched where the 
correct answer is to appear when the answer key is overlaid upon the test 
booklet. More than one answer could be seen through the holes with the 
gopy the writer possessed. This condition is likely to add to the confusion 
of using such a scoring device and may even result in an occasional error if 
the examiner is not exceedingly cautious in his scoring. The hand.scoring 
procedure is not recommended by the reviewer. An addltlonai set of 
directions is provided in the manual of directions for use with separate 
.answer sheets. 

Norms 

The norming population for the Upper Primary, Elementary, and Inter- 
mediate Reading Tests consisted of a random sample of 25 percent of the 
500,000 students from 225 school systems in 49 states administering the 
entire Metropolitan Achievement Test In October 1958, The sample was 
controlled for age to insure normal grade placement of those in the 
sample. As mcniioned previously, the authors indicate that the norms are 
slightly higher than would be expected with an unselected group. The 
procedures used by the authors adhere to the rules and constraints of 
norming, The norms do not include contributions by repeat students and 
thus will present norms that may be unrealistic for some schools. Sex and 
socioeconomic data are not available, but further information regarding 
the gcopaphlcal distribution of participating schools is available upon 
request from the publisher. Such data should be obtained by teachers prior 
to purchasing iests so thai test data will be of maximum use. 

Validity was established by identifying the reading skills and the levels 
at which they were tested from reviewing the related research and from 
examining reading programs* The tests were then constructed to measure 
the reading skills at various reading levels. No bibliography of the research 
reviewed is provided, and the combined term "reading programs'' Is 
undefingd, The description of validity is discussed under "curricular con- 
sideratlons'' and is only one paragraph in length. Besides being inadequate 
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in length and cuntenl, the description is vague mid raises more questions 
than it answers* 

Additionul information on validity is found in the ''Manua! for hiter- 
prctliig." Tlie discussion is a complete description of the general nature of 
the problems of establishing validity but is an inconiplete dcscrlpilon of 
the actua! validity of the reading tests in the series. An analysis of eleven 
basal reading series was used from the New York City Board of Education 
vocabulary study, A careful coniro! was made in placing words at the 
median level as found in tlie basal readers. The authors state that ''exten- 
sive experimentation showed'* that, the sentences as chosen would not 
invalidate the test results. What the nature of ''extensive experimentation" 
was is difficult to determine. 

Major criticism of the "'Manual for Interpreting'* and the "Directions 
for Administcrnig^^ is that they arc incomplete in what tlicy ofrer. Most 
information about the test characteristics arc included In summary para- 
graphs for all grade levels in the battery. The test authors make statements 
about what they believe the test to be but do not make explicit the source 
of their data other than in general terms. They do rightfully caution not to 
try to combine scores or to us*3 individual items to interpret a pupiFs 
progress. However, It would take a teacher sevcial hours of reading a great 
deal of material to find the inforniuiion he needs to critically evaluate the 
test and to note the necessary cautions in interpreting the test scores. It 
should be noted, however, that the material provided by the publishers 
contains a great deal of analysis and work^ consequently making the test a 
very useful one to measure genera! reading achievement. 

Reliability 

Reliability was determined by the split-half technieiue. Four indepem 
dent estimates of reliability were made for each test, and the ranges and 
medians of the four are reported below. Each estimate was chosen to 
typify a different performance 'level on the test. One hundred subjects at 
grade level 3.1 for the Upper Primary Roading Test, grade level 4, 1 for the 
Elementary Reading Test, and grade level 6 J for the Intermediate Reading 
Test wore randomly selected from each of four school systenis to partici- 
pate In the reliability studies. A total of 400 students participated in the 
reliability studies for each test. All the reliabilities are high and quite 
Satisfactory for n\l levels of the test, beiiig ,90 or above. No correlations of 
the Metropolitan scries and other reading tests are presented in either of 
the two documents. 

It appears that the validity oi^ tlie readiiig tests Is based on readability 
analysis using the Lorge and Flesch formula and the reliability data. Other 
data to support the test author's claims are desired, It should be noted tliat 
extensive support is presented for the Spelling test. Similar support is 
desired for the Reading subtestSi 

Standard score stanlnc, percentile ranks, and grade equivalent tables arc 
available for score interpretation. The manual of directions provides an 
outstanding discussion of the merits and limitations of each In an attempt 
to indicate how the test scorej may be used to improve the services of the 
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school to the child. The section ''Use of Results*' providos a number of 
purposes for which the obtained dntajniglu prove beneficial to classroom 
teacherSs principals, administrators^ arid supervisors, 

Sumnnary 

The Metropolitan Test Battery includes a range of levels of rcuding tests 
which should be very useful to measure general reading achievement. All 
the tests appear to be very reliable. The mrd Knowledge and Discrimh 
nation Vocabulary have been carefully CQntrollcd and appear to be 
mGasures of the content as taught in most basal readers. Validity is not 
supported by data to insure that the topics presented are actually 
measured on the Rmding subtests, A carefiil analysis must be made by the 
teacher to insure that these tests match the program as taught in individual 
classrooms-a procedure suggested for any achievement test, particularly 
when data are not available, the tests are attractive, current, and both easy 
to administer and score. Percentiles, stanincs, and grade equivalent scores 
are available as well as are suggestions on how to use the test results to 
improve a classroom proparn, 

• Stanford Aohtevanient Tests»Reading 
Overviiw 

The Stanford Actiievement Tests=Reading are part of an achievement 
series designed to measure the major academic areas of the elementory and 
Junior higli curriculum. The present tests, which were published in 1964, 
represent the fifth revision, Tliis review will consider only the reading 
subtests of the four batteries used at various grade levels in the elementary 
grades, 

The four batteries are Primary I, used with students from the middle of 
first grade to the middle of second grade; Primary II, used with students 
from the middle uf grade two to the end of grade three; Intermediate I, 
used with students from the beginning of grade four to the middle of grade 
five; and Intermediate Moused with students from the middle of grade five 
to the end of grade six. Each of these tests includes subtests for measuring 
word reading and paragraph meaning. In addition, Primary I and II and 
Intermediate I have a Word Study Skills subtest Three forms (W,X,Y) are 
available for the Primary tests and four forms (W,Xy,Z) are available for 
the Intermediate tests. 

The directions for administering the tests are. clear and concise and, 
consequently, should simplify one's administration of the tests and help to 
insure the test had been normed at the three different periods in the 
school year represented by the tables. 

Each of the subtests is a Jimed test. The publisher suggests that the time 
limits . . are generpui an3 calcuiated to give practically all pupils suni' 
cient time to attempt all questions which the pupils are capable of answer- 
ing correctly " There is no evidence given to support this statement; one 
may find that some of the slower readers at every grade level are unable to 
complete the tests. For example, on the Word Mmning subtest of 
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Intcrniediate II, tlie student is to read an incomplete sentence and select 
from four alternatives the corfect word to coniplete the ientence. There 
are 48 of these items to be coinpleted in 12 minutes=an average of IS 
seconds per item. For the Paragraph Mmning subtest of the same test, the 
student is to read a total of 24 selegtions ranging in length from one 
sentence of only ten words to multiple-sentence paragraphs of up to 75 
words. There are a total of 64 multipie choice items for these paragraphs. 
The total testing Is 30 minutes; this time allows for an avera|e time of 
sliglitly less than one minute for reading each selection and answering from 
one to five multiple choice questions. 

Primary I includes three reading subrests. The Word Rmding subtest 
measures the student's ability to match a picture with one of four words. 
Generally, the pictures do not seem to be overly biased tOTOrd a middle 
class population^ and they are clear and easy to Interpret The Paragraph 
Mmning subtest contains 33 paragraphs with a total of 38 blanks in the 
paragraphs. The pupil is to supply the correct word for each blank from 
four alternatives. Several of the items call for the understanding of a single 
word. For this reason there is probably a ^eat deal of similarity between 
this subtest and the Word Mmning subtest. Supporting this point is the 
fact that the correlation of the two subtests for first grade children is ,72. 
Because of this high similarity, the two subtests should never be used as 
measures of distinct reading skills but should only be used as indications of 
general reading ability. 

There are four separate parts in the Word Study SkUls subtest. AH of 
the tests measure the pupirs ability to match' written symbols with spoken 
sounds. The test utilizes matching beginning sound of words and letters, 
matching ending sounds of words and letters, matching a spoken word 
with a written rhyming word, and matching a spoken word with its written 
form. The test correlates ,73 and ,67 with Word Rmding md Paragraph 
Mmning respectively, Agains one is strongly cautioned against any attempt 
to utilize this score diagnostically. 

For Primary II, the correlations between the Word Rmding mxA Para- 
graph Mmning subtests are even higher than for Primary I. For both 
second and third graders the correlations are .83. Againj one is strongly 
cautioned against using these subtests as diagnostic measures of distinct 
reading skills. In fact^ the test publisher should not even provide separate 
scores for the subtests but instead should combine them into a single 
reading score. 

The Word Mmning subtest of Primary II measures the pupil's ability to 
pick from four alternatives the final word of an incomplete sentence. 
Some of the items seem to be measuring other skills than word meaning. 
One item tests the student's knowledge of number of items in a dozen; 
another item is based on whether the student knows the name of a specific 
country. The Paragftiph Mmning subtest utilizes the same procedure us the 
Primary 1 test. The pupil is to supply the missing word in a paragraph. 
Four alternatives are supplied for each blank. 

The Word Study Skills subtest is divided into three parts: the first two 
parts include auditory discrimination tests for beginning and ending 
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sounds and the third part measures the pupirs ability to mateh the under- 
linod part of a word to a word that has the same sound; for this third part, 
no words are pronounced for the pupih The correlations of this subtest 
with Word Meaning and Paragraph Meaning are M ond .73 at grades two 
and three. 

The Intermediate I subtests follow the same form as Primary II; Intcr^ 
mediate II is also the same, but It does not include a svord study subtest. 
As □nc miglit expect, there are extremely high correlations between the 
Word Mmning and Paragraph Mmning subtests, For Intefmediate I at 
grade 4, the correlation Is .82; for Intermediate I! at grades 5 and 6, the 
correiations are .83. Again, one must not attempt to use these separate 
subtests for any diagnostic purposes. The Word Study Skills Test for Inter- 
mediate 1 also correlates very highly with Word Meaning (J \) und Para- 
graph Mmning (.73) at grade four. Several of the items on the Word 
Mmning subtest of both Intermediate I and intermediate II seem to be 
measuring knowledges other than word meanings. 

Norms 

The norming population for the Stanford Achievement Test is a care- 
fully selected stratified sample from the total student population in the 
United States. The publisher will provide, upon request, a booklet entitled 
Stanford Achievement Test: A Supplementary Report an the Norm 
Group. This booklet describes in detail the relevant data regarding the 
norm group. If one uses the norm tables in the test manual, one will 
probably want to utiliEe these descriptions to see how a speeifie group 
compares to national population on such Items as eqononilc character- 
istics, regional characteristics, and size and location of community. As 
suggested with other tests, it would be very useful to also develop local 
norms/ However, the norms provided by the test publisher are as repre- 
sentative of actual national student performance as those of any other 
published test available. 

Form X was, however, the only form of the test standardized by the 
pubHsher, Other forms of the test were equated to this form in a study 
with seven school systems. Because of the lack of information regarding 
the correlation of these forms, there Is not the same assurance that the 
norms for the other forms of the test are as representative of national 
achievement as Form X. For this reason, if only one form of the test is 
needed, use Form X. 

Reliability 

The reported reliability coefficients for the reading subtests indicate 
that one can be fairly certain that the score a student receives on one day 
will be quite similar to tlie score he receives on another day. These relia- 
bilities were based only on Form X of the test and pre all determined by 
the split^mlf procedure. The publisher should have reported the corrcla« 
lions of Form X with the other forms of the test. This information was 
probably available for the study In which the publisher equated the forms, 
but it was not reported in the technical manuah The effect of timing a test 
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can artificially increase the reliabiliiios. As most of the Stanrurd subtests 
are timed tests, the publisher should have conducted a study to determine 
if timing the subtests did indeed aflect the reliabilities. This inftjrmation 
^ was reported for the 1953 edition/ but it is not reported for the 1964 
edition. In goneraU one can be surer of the reliability of Form X than of 
the other forms of the test; students' performance during the tests ^should 
be carefully observed to see if there is ample time to complete the tests. 

Validity 

. The best procedure for determining if the Stanford Reading Tests arc 
valid measures of reading for specific purposes is to compare the content 
and format of the tests to the histructional program. A description of the 
procedurei followed In developing the content outline for the test is 
described In the technica! manual and should be carefully studied. The 
careful iryout and review of items by a variety of reading specialistSj 
classroom teachers, and test developers have probably aided in the 
improvement of the test content. It is strongly recommended, however^ 
that the subtests not be used for diagnostic purposes, The publishers have 
not developed the reading subtests for these purposes, and no diagnostic 
validity evidence for the subtests is presented. In fact, the intercorfelations 
of subtests indicate that the reading subtests all seem to be measuring the 
sanie general reading ability. 

The correlations of the reading subtests with the Otis-Quick-Scoring 
Mental Ability Test indicate that the Otis test and the reading tests are 
measuring quite different abilities at the lower grade levels, but at the 
upper grade levels the measured abilities appear to be more similar. This 
result Is in keeping with studies of other tests that indicate that after a 
student has mastered the basic skills of readings measures of intelligence 
and reading are quite similar. The correlations at all grades are low enough 
to make valid use of both an inteillgence test and the Stanford reading 
tests for determining need for reading improvement based on the dis- 
crepancies between reading ability and mental ability. 

Summary 

The Stanford Achievement Tests-Rrading are carefully constructed 
tests for measuring general reading ability. The test norms represent an 
outstanding effort to develop truly representative national norms. One 
should find these tests quite useful in comparing one*s students to national 
aclilevement levels. The subtests should not be used diagnostically. 

The lack of complete data reprding the comparability of all forms 
leads the writer to recommend the use of Form X whenever only one form 
of the test is needed. This statement does not mean that one should not 
use the other forms, but the reliability and normlng data for these forms 
are not so complete. 

This test series is one of the better tests on the market and is found to 
be quite useful for measuring the general reading achievement of students. 
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Appendix 



THE iots of charts on tlio following pages have been prepared as q 
summary of the critical points discussed more fully in the text. 

The first set presents a general description of the reading readiness and 
the reading achievenicnt tests. This summary inckidcs the grade levels for 
which !lie test is Intended, subtest names, and the apprDpriate time it takes 
to adminiiler the instrument as well as the name of the test author and 
manufacturer. 

The second set of charts Is a summary evaluation of the technical 
evaluation of the tests as described from the technical manuals and reports 
provided by the publisher. A quick perusal will reveal that each test has 
some strengths and some alarming weaknesses, It cannot be emphasised 
enough that a commercial tost should be carefully evaluated. Many are 
attractive and time saving. Most of the tests claim to measure specific 
reading skills. However, sufficient evidence to support the assertion that 
the subtests measure the skills that arc inferred from the title is ahnost 
conipletcly missing in all of the tests reviewed. Only the total test score 
and subtest scores (one or more scales) seem to be reliable enough to be 
used with children. Even here some tests are lacking in evidence to support 
their claimSi 

All of the readhig tests reviewed in this book measure ftofi^/ reading or 
readiness skills. In spite of the titles, the tests are of little diagnostic value. 
The so-called diagnostic charts included in many of the test manuals can 
be misleading if used. 

The major weakness of the commercial tests is also their major strength. 
As a global measure of reading behavior they are excellent in that they give 
a reliable and valid estimate of the achievement range of children in a class 
in compurison to a larger group. The norms for most of these tests 
generally are representative of national achievement; the more recent tests 
are greatly iniproved in this regard. The standardization of procedures in 
administering the tests are near perfect in terms of clarity of the directions 
provided. In addition, advanced technical techniques are being applied to 
most tests, 

Teachers, the writer predicts, will desire more sophisticated measures as 
their knowledge of how a test should be used increases. Hopefully^ grade 
level scores, short unreliable scales, and meaningless diagnostic outlines 
vyjll be removed from the tests through the Joint efforts of the teacher and 
tost manufacturer. 
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